ChatGPT creator OpenAI is facing a class action suit for allegedly misappropriating massive amounts of personal data from the internet to develop its AI products.
Noting that a “mature market” for data exists, the 157-page lawsuit claims that despite established protocols, OpenAI resorted to “theft” to systematically scrape 300 billion words from the internet, which included personal information, without consent.
The data was then used to develop products using large language models and deep language algorithms to study and generate human-like language that can be used for a wide range of applications, including chatbots, language translation, and text generation.
“It doubled down on a strategy to secretly harvest massive amounts of personal data from the internet, including private information and private conversations, medical data, information about children – essentially every piece of data exchanged on the internet it could take – without notice to the owners or users of such data, much less with anyone's permission,” claimed the plaintiffs, who were described only by their professions and initials.
“OpenAI did so in secret, and without registering as a data broker as it was required to do under applicable law,” they added.
The proposed class action lawsuit, which was filed before a California federal court, also lists OpenAI investor Microsoft as a defendant. It’s asking for a temporary freeze on further commercial use of OpenAI’s products as well as payments of “data dividends” to compensate those whose information was used to develop and train OpenAI’s products.
The lawsuit comes as concerns around AI technology grow. Recently, leaders of artificial intelligence companies, including OpenAI CEO Sam Altman, have been at the center of conversations to mitigate its risks and address concerns around possible privacy violations.