Original link: https://www.williamlong.info/archives/7140.html
Social platform Reddit doesn’t want to give tech giants free access to massive amounts of data anymore. On April 18, local time, according to the “New York Times”, Reddit recently stated that it plans to start charging companies that access its application programming interface (API), and external companies can download and process massive conversations in the social network for a fee.
According to public information, Reddit is known as the “American version of Baidu Tieba”. It is an 18-year-old social media platform where users can post, comment, and exchange various topics.
In recent years, chat content posted on Reddit has become training material for companies such as Google, OpenAI and Microsoft, which are collecting and using conversations on the Reddit platform to develop generative artificial intelligence products such as ChatGPT.
“Reddit’s data corpus is very valuable,” Reddit founder and CEO Steve Hoffman said in an interview with the New York Times, “but we don’t want to give this content to some giant companies for free.”
Reddit also became one of the first companies to publicly state that it would require tech giants to pay for their data.
“It’s unreasonable for these artificial intelligence companies to use Reddit data to create value without returning any value to Reddit users.” Hoffman believes that in his view, charging these technology giants for this is a fair move .
Google, Open AI, and Microsoft have yet to respond to requests for comment, according to The New York Times. The underlying algorithm of the chatbot Bard developed by Google is partly trained on Reddit data, and OpenAI’s ChatGPT also cites Reddit data as one of the sources of information for training.
Reddit has not disclosed the specific charging rules and types, and the outside world is expected to implement classified prices according to the size of the data.
Reddit’s API will continue to be freely available to developers who want to build apps that help people use Reddit, as well as to researchers who study Reddit data for non-commercial purposes, Hoffman said.
In the future, Reddit hopes to incorporate more machine learning into the website application, such as to identify the use of artificial intelligence-generated text on Reddit, and add a label to notify users that the comment is from a robot. At the same time, it will also support forum administrators to use third-party robots that help monitor user posting content for easy management.
Reddit’s move may be related to its planned IPO (initial public offering) this year.
Founded in 2005, Reddit’s main revenue comes from advertising and e-commerce transactions on the platform. Reddit said it is still finalizing the details of what it will charge for API access and will announce prices in the coming weeks.
It is worth noting that in addition to Reddit, other companies are gradually no longer willing to provide platform data for free. On April 19, local time, US Consumer News and Business Channel (CNBC) reported that social media Twitter CEO Elon Musk threatened to sue Microsoft. Musk accused Microsoft of illegally using data from his social media Twitter to train its artificial intelligence models.
Earlier media reports said that Microsoft’s advertising platform will stop supporting Twitter because Twitter changed the pricing of its API. Musk left a message under this tweet, “They illegally used Twitter data for training. The time for litigation is up.” According to Twitter’s new pricing, API users (including enterprises and research institutions) need to pay at least up to $42,000 per month can only be used.
According to CNBC reports, large-scale language models similar to GPT require TB-level databases (more than 1TB of stored data) for training, most of which are captured from social networking site Reddit, programmer question-and-answer community StackOverflow, and Twitter. Training data for social networks is valuable because of its interactive dialogues in informal settings.
Source: The Paper
This article is transferred from: https://www.williamlong.info/archives/7140.html
This site is only for collection, and the copyright belongs to the original author.