Authors sue OpenAI for scraping and summarizing their books

AccidentalLemming@lemmy.world · 1 year ago

Authors sue OpenAI for scraping and summarizing their books

mattes@lemmy.kussi.me · 1 year ago

There is no way to prove it didn’t just scrape 10 other summaries and reworded them slightly. And given the nature of such language models and limited context length it’s actually more likely, than it understanding and summarizing an entire book.

pulaskiwasright@lemmy.ml · 1 year ago

They could subpoena people who actually know how openai did it.

Jtthegeek@lemmy.dbzer0.com · 1 year ago

That’s a bold assumption that openai even knows. Part of the magic of how their large language model works is non-inversion. You cannot take an output and derive backwards to a precise input ad the inputs are no longer present in the tokenization chain that’s formed during the learning process. This is a byproduct of all currently language learning models AFAIK. Building in the ability to enable reversible computation would add infathomable complexity in these types of systems.

pulaskiwasright@lemmy.ml · 1 year ago

They know the training data sources.

shinjiikarus@mylem.eu · 1 year ago

Not necessarily: Facebook has used a public-private-partnership with a German university to let them train the model on publicly available data, no matter the copyright status. The university is allowed to do this, since science enjoys a lot of defined rights, which rank higher than commercial copyright in Germany specifically (but I can imagine in other places as well). Facebook just received the model. This is obviously a ploy for plausible deniability and morally wrong, but it hasn’t been challenged in court yet and is believed to hold up currently. I can imagine OpenAI to be smart enough to have one or more layers of buffering between themselves and the dataset as well.

Authors sue OpenAI for scraping and summarizing their books

Authors sue OpenAI for scraping and summarizing their books

Authors file a lawsuit against OpenAI for unlawfully ‘ingesting’ their books