Authors sue OpenAI for scraping and summarizing their books

AccidentalLemming@lemmy.world · 1 year ago

Authors sue OpenAI for scraping and summarizing their books

Jtthegeek@lemmy.dbzer0.com · 1 year ago

That’s a bold assumption that openai even knows. Part of the magic of how their large language model works is non-inversion. You cannot take an output and derive backwards to a precise input ad the inputs are no longer present in the tokenization chain that’s formed during the learning process. This is a byproduct of all currently language learning models AFAIK. Building in the ability to enable reversible computation would add infathomable complexity in these types of systems.

pulaskiwasright@lemmy.ml · 1 year ago

They know the training data sources.

shinjiikarus@mylem.eu · 1 year ago

Not necessarily: Facebook has used a public-private-partnership with a German university to let them train the model on publicly available data, no matter the copyright status. The university is allowed to do this, since science enjoys a lot of defined rights, which rank higher than commercial copyright in Germany specifically (but I can imagine in other places as well). Facebook just received the model. This is obviously a ploy for plausible deniability and morally wrong, but it hasn’t been challenged in court yet and is believed to hold up currently. I can imagine OpenAI to be smart enough to have one or more layers of buffering between themselves and the dataset as well.

Authors sue OpenAI for scraping and summarizing their books

Authors sue OpenAI for scraping and summarizing their books

Authors file a lawsuit against OpenAI for unlawfully ‘ingesting’ their books