• Jtthegeek@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    1 year ago

    That’s a bold assumption that openai even knows. Part of the magic of how their large language model works is non-inversion. You cannot take an output and derive backwards to a precise input ad the inputs are no longer present in the tokenization chain that’s formed during the learning process. This is a byproduct of all currently language learning models AFAIK. Building in the ability to enable reversible computation would add infathomable complexity in these types of systems.

      • shinjiikarus@mylem.eu
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Not necessarily: Facebook has used a public-private-partnership with a German university to let them train the model on publicly available data, no matter the copyright status. The university is allowed to do this, since science enjoys a lot of defined rights, which rank higher than commercial copyright in Germany specifically (but I can imagine in other places as well). Facebook just received the model. This is obviously a ploy for plausible deniability and morally wrong, but it hasn’t been challenged in court yet and is believed to hold up currently. I can imagine OpenAI to be smart enough to have one or more layers of buffering between themselves and the dataset as well.