|At the heart of the debate is the question of whether AI companies have the legal right to scrape content off the internet and feed it into their training models. A legal provision called “fair use” allows for copyright material to be used without permission in certain circumstances.|
Publishers Prepare for Showdown With Microsoft, Google Over AI Tools
Media executives want compensation for use of their content in ChatGPT, Bing and Bard
By Keach Hagey, Alexandra Bruell, Tom Dotan and Miles Kruppa
Wall Street Journal
Updated March 27, 2023 9:40 am ET
Since the arrival of chatbots that can carry on conversations, make up sonnets and ace the LSAT, many people have been in awe at the artificial-intelligence technology’s capabilities.
Publishers of online content share in that sense of wonder. They also see a threat to their businesses, and are headed to a showdown with the makers of the technology.
In recent weeks, publishing executives have begun examining the extent to which their content has been used to “train” AI tools such as ChatGPT, how they should be compensated and what their legal options are, according to people familiar with meetings organized by the News Media Alliance, a publishing trade group.
“We have valuable content that’s being used constantly to generate revenue for others off the backs of investments that we make, that requires real human work, and that has to be compensated,” said Danielle Coffey, executive vice president and general counsel of the News Media Alliance.
ChatGPT, released last November by parent company OpenAI, operates as a stand-alone tool but is also being integrated into Microsoft Corp.’s Bing search engine and other tools. Alphabet Inc.’s Google this week opened to the public its own conversational program, Bard, which also can generate humanlike responses.
Reddit has had talks with Microsoft about the use of its content in AI training, people familiar with the discussions said. A Reddit spokesman declined to comment.
Robert Thomson, chief executive of The Wall Street Journal parent News Corp, said at a recent investor conference that he has “started discussions with a certain party who shall remain nameless.”
Google this week opened public access to the conversational computer program Bard.PHOTO: JAAP ARRIENS/ZUMA PRESS
“Clearly, they are using proprietary content—there should be, obviously, some compensation for that,” Mr. Thomson said.
At the heart of the debate is the question of whether AI companies have the legal right to scrape content off the internet and feed it into their training models. A legal provision called “fair use” allows for copyright material to be used without permission in certain circumstances.
In an interview, OpenAI CEO Sam Altman said “we’ve done a lot with fair use,” when it comes to ChatGPT. The tool was trained on two-year-old data. He also said OpenAI has struck deals for content, when warranted.
“We’re willing to pay a lot for very high-quality data in certain domains,” such as science, Mr. Altman said.
One concern for publishers is that AI tools could drain traffic and advertising dollars away from their sites. Microsoft’s version of the technology includes links in the answers to users’ questions—showing the articles it drew upon to provide a recipe for chicken soup or suggest an itinerary for a trip to Greece, for example.
“On Bing Chat, I don’t think people recognize this, but everything is clickable,” Microsoft CEO Satya Nadella said in an interview, referring to the inherent value exchange in such links. Publishing executives say it is an open question how many users will actually click on those links and travel to their sites.
Microsoft has been making direct payments to publishers for many years in the form of content-licensing deals for its MSN platform. Some publishing executives say those deals don’t cover AI products. Microsoft declined to comment.
The tensions over AI tools add another dimension to the already-fraught relations between tech companies and the publishing world.PHOTO: JOHN MINCHILLO/ASSOCIATED PRESS
In early tests on Tuesday, Google’s Bard often served up answers to queries without providing links to the underlying news sources.
Asked to provide a summary of the biggest news in the New York Times, Bard responded with a list of items, including news of the Biden administration’s decision to send military aid to Ukraine and a new round of sanctions against Russia. It ended the response, “For more on these and other stories, please visit the NYT website,” without providing links or citations for the answer.
Sissie Hsiao, a vice president in charge of Google Assistant, said the company “is deeply committed in supporting a healthy and vibrant content ecosystem” and “will be welcoming conversations with stakeholders.” She said when AI tools are integrated into search the company will give priority to sending valuable traffic to content creators.
Google already has struck deals to pay some publishers, including News Corp, for using their content in a product called Google News Showcase, which has yet to launch in the U.S.
The emerging tensions over AI tools add another dimension to the already-fraught relations between big tech companies and the publishing world. Publishers have relied on tech companies such as Google and Meta Platforms Inc.’s Facebook to help their content reach a wide audience, but also have increasingly pushed those companies to pay for using it.
Microsoft rolled out this year its upgraded Bing search engine.PHOTO: GABBY JONES/BLOOMBERG NEWS
Legislation that would let U.S. publishers negotiate collectively, without running afoul of antitrust regulations, circulated in the last session of Congress and is expected to be reintroduced soon, according to people familiar with the situation. That legislation is intended to cover commercial arrangements including for AI tools, one of the people said.
Some litigation has begun to test the limits of web-scraping to train AI when it comes to images and code, but so far there hasn’t been a major case involving text. Tech companies generally argue that their actions are covered by fair use. In February, Getty Images sued the AI art company Stability AI in Delaware, alleging that it had infringed on Getty’s copyrights. Stability AI said it doesn’t comment on pending litigation.
Last week, the U.S. Copyright Office said it launched an initiative to study issues raised by AI, including “the use of copyrighted materials in AI training.”
Among the commercial pacts OpenAI has already struck, Mr. Altman pointed to a deal last fall with Shutterstock Inc., the stock-photography company that has expanded into everything from video to 3-D models. As part of the agreement, OpenAI licensed data from Shutterstock, and Shutterstock got to use OpenAI technology. At the same time, Shutterstock opened a fund to compensate the artists whose work went into training the AI.
“We think it’s reasonable and rational to make sure that our contributors get compensated,” said Paul Hennessy, CEO of Shutterstock.
Publishers are now actively exploring whether to press for similar deals for themselves. Complicating the discussion, many media companies at the same time are embracing the technology. BuzzFeed and the publisher of Sports Illustrated are among the companies that have said they will rely on AI tools to create and personalize some content.
For years, publishers already were concerned that Google’s search engine was starting to give out specific answers to user queries like “Who’s in the cast of Succession?”—without links to media properties, offering what it calls “knowledge panels.”
Bing Chat goes even further, answering questions in a way that is so comprehensive that users could have little incentive to click on links that are provided.
“The share of what’s available to publishers to control has been inexorably declining since 2010,” said Rand Fishkin, the CEO of SparkToro and a veteran of the search-engine-optimization industry. “That kind of feels like the way of the internet.”
Write to Keach Hagey at Keach.Hagey@wsj.com, Alexandra Bruell at firstname.lastname@example.org, Tom Dotan at email@example.com and Miles Kruppa at firstname.lastname@example.org
Publishers Prepare for Showdown With Microsoft, Google Over AI Tools - WSJ