Mona Awad and Paul Tremblay allege that their books, which are copyrighted, were ‘used to train’ ChatGPT because the chatbot generated ‘very accurate summaries’ of the works
Not necessarily: Facebook has used a public-private-partnership with a German university to let them train the model on publicly available data, no matter the copyright status. The university is allowed to do this, since science enjoys a lot of defined rights, which rank higher than commercial copyright in Germany specifically (but I can imagine in other places as well). Facebook just received the model. This is obviously a ploy for plausible deniability and morally wrong, but it hasn’t been challenged in court yet and is believed to hold up currently. I can imagine OpenAI to be smart enough to have one or more layers of buffering between themselves and the dataset as well.
They know the training data sources.
Not necessarily: Facebook has used a public-private-partnership with a German university to let them train the model on publicly available data, no matter the copyright status. The university is allowed to do this, since science enjoys a lot of defined rights, which rank higher than commercial copyright in Germany specifically (but I can imagine in other places as well). Facebook just received the model. This is obviously a ploy for plausible deniability and morally wrong, but it hasn’t been challenged in court yet and is believed to hold up currently. I can imagine OpenAI to be smart enough to have one or more layers of buffering between themselves and the dataset as well.