Sam Clemente

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 where am I wrong? The process is effectively the same: you get a set of training data (a textbook) and a set of validation data (a test) and voila, I’m trained

To learn how to draw an image of a thing, you look at the thing a lot (training data) and try sketching it out (validation data) until it’s right

How the data is acquired is irrelevant, I can pirate the textbook or trespass to find a particular flower, that doesn’t mean I’m learning differently than someone who paid for it

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 data quality, again, was out of the scope of what I was talking about originally

Which, again, was that legal precedent would suggest that the *how* is largely irrelevant in copyright cases, they’re mostly focused on *why* and the *scale of the operation*

I’m not getting sued for copyright infringement by the NYT because I used inspect element to delete content to read behind their paywall, OpenAI is

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 1) In no way is quality a part of that equation and 2) In what other contexts is quality ever a part of the equation? I mean I can go look at some Monets and paint some shitty water lillies, is that somehow problematic?

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 from what I understand, you’re referring to the process at scale—the amount of information the AI can take in is inhuman—which I’m not disagreeing with

None of which is relevant to my original point: the scale of their operations, which has already been used countless times in copyright law

The scale at which they operate and their intention to profit is the basis for their infringement, how they’re doing it would be largely irrelevant in a copyright case, is my point

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 we’re saying the same thing

It’s a matter scale, not process

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 you’re getting away from the original conversation

Sam Clemente@allthingstech.social · 2 months ago

@zbyte64 with everything you see you are scraping data from your environment whether you want to or not

How does a child learn what pain is? How does a teenager learn what heartbreak is? It’s certainly not because they made the decision to find that out themselves

Sam Clemente@allthingstech.social · 2 months ago

@Subverb that is, quite impressively, the opposite of what I said

Is a person infringing on copyright by producing content? No. It’s about intent and scale. Humans don’t just sit on this knowledge, they do something with it

There is nothing illegal about WHAT it’s doing, there is everything illegal about HOW and WHY

I very clearly stated that OpenAI’s intent and their scale at which they operate are blatant copyright infringement and that it has been backed up with decades of precedents

Sam Clemente@allthingstech.social · 2 months ago

@Pika @flop_leash_973 This is largely my thoughts on the whole thing, the process of actually training the AI is no different from a human learning

The thing about that, is that there’s likely enough precedent in copyright law to actually handle that, with most copyright law it’s all about intent and scale and I think that’s likely where this will all go

Here the intent is to replace and the scale is astronomical, whereas an individual’s intent is to add and the scale is minimal