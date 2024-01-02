According to the indictment for the New York Times's case, chatbots owned by OpenAI and Microsoft have taken in millions of original news articles from the NYT. Not only did they "copy" the original reports word for word when responding to user queries, but they also imitated the writing style, generated excerpts and summaries of the articles and presented them as reliable sources of information. These practices diverted traffic that would have otherwise gone to the newspaper's website, thereby reducing its advertising and subscription revenue. In its statement, the NYT pointed out that "there is nothing 'transformative' about using the Times's content without payment to create products that substitute for the Times and steal audiences away from it".

The legal risks of copyright infringement in data training generative AI involve issues such as attribution and modification rights. It is also impossible to ensure that 100% of the content is obtained through legitimate means with authorisation. In fact, using "web crawler" technology to collect online data is the most common means of data collection by generative AI.

Both OpenAI and Microsoft claimed that they trained their AI tools based on the principle of "fair use" in internet governance. Generally speaking, the concept of "fair use" should embrace the following factors: non-commercial usage, moderate copying and non-infringement on the market value of the original work. Regarding this, the Times, a British newspaper, pointed out that if a company uses the contents of pay-to-read newspapers and magazines for free in its own products to replace the original sources, thus absorbing and redirecting readers, it does not qualify as "fair use".

Since the birth of ChatGPT more than a year ago, the media industry has been actively discussing the legal, financial and media's ecological impacts brought about by generative AI on the industry. Different organisations have struggled in their own ways to survive. Starting last year, the American Cable News Network (CNN) added code to its website to block OpenAI from scanning its online content using web crawlers. Meanwhile, some media companies, including the Associated Press and Axel Springer, a German news media group that owns online political news site Politico, reached content licencing agreements with OpenAI.

This lawsuit filed by the New York Times is a result of failed licencing negotiations, which forced the newspaper to resort to legal action. It is expected that the AI industry will continue to face more infringement lawsuits.

明報社評 2024.01.02：《紐時》狀告科技先驅 影響人工智能業深遠

近年全球掀起人工智能（AI）熱潮，但也引發知識產權爭議。2023年最後一周，美國《紐約時報》將開放人工智能研究中心（OpenAI）及其主要投資者微軟公司（Microsoft）告上法庭，指控兩間公司未經授權使用該報數以百萬計文章，訓練ChatGPT等聊天機械人，侵犯知識產權。這宗官司被形容是新聞界翹楚對科技業先驅打響的「第一槍」，為傳媒業與互聯網巨頭多年的互聯網利益爭奪戰開闢了一條「新戰線」，對於未來的傳媒生態乃至AI發展都有深遠影響。

《紐時》起訴書指，OpenAI和微軟旗下的聊天機械人吸納了該報幾百萬篇原創報道，不僅將原報道逐字逐句地「複製」給提問的用戶，還模仿其寫作風格，對文章提煉、總結，被當作可靠消息來源。這些做法分流了原本會流向該報網站的流量，令其損失廣告和訂閱收入。該報的聲明指出，「在不付費情况下使用《紐時》內容，來創造替代《紐時》並搶走其讀者的產品，並不具有『變革性』」。

生成式人工智能（Generative AI，又稱生成式AI）

數據訓練的侵權法律風險，涉及署名權、修改權等問題，亦無法做到百分百獲準確授權合法取得內容，而利用「網絡爬蟲」（web crawler）技術攫取 網上數據，更是生成式AI最常用的數據收集手段。

OpenAI和微軟均聲稱，他們是依據互聯網治理中的「合理使用」（fair use）原則來訓練其AI工具。一般而言，「合理使用」須符合非商業用途、適度複製、不侵犯原作品市場價值等元素。對此，英國《泰晤士報》指出，免費使用需付費報章雜誌的內容，用於自身產品，以替代對方、吸收並轉移其讀者，這並非「合理使用」。

ChatGPT問世一年多來，傳媒業積極探討生成式AI帶來的法律、財務與新聞生態影響，各師各法，掙扎求存。去年起，美國有線電視新聞網絡（CNN）在其網站添加程式碼，阻止OpenAI以「網頁爬蟲」技術掃描其網站內容；包括美聯社及政治新聞網媒Politico的母公司德國Axel Springer報業集團在內的部分傳媒，則與OpenAI達成內容授權協議。

《紐時》今次興訟，亦是在授權談判未果之後，才被迫拿起法律武器。預料AI業面臨的侵權官司，還將陸續有來。

/ Glossary生字 /

indictment：an official statement accusing somebody of a crime

divert：to make somebody or something change direction

resort to something：to make use of something, especially something bad, as a means of achieving something, often because there is no other possible solution