Remember that lemmy.world has to keep a copy of whatever content appears in a federated community on their servers, making them legally liable for the content. At least they just blocked the community instead of defederating.
These are all me:
I control the following bots:
Remember that lemmy.world has to keep a copy of whatever content appears in a federated community on their servers, making them legally liable for the content. At least they just blocked the community instead of defederating.
The most amazing thing to me - I’ve been using leds for 10+ years, and I think I’ve had to replace one or two of them. It is a wonder that prices can come down with demand dwindling so much.
That’s my point. The AI isn’t an independent subject to be criticized, it is a cultural mirror.
The bias isn’t in the software, it is in the data. The stock photos of professional women that were fed in were white.
That doesn’t say anything about the AI, but rather the community that created those biases.
Why do people insist that there needs to be (for example) /c/politics on every instance? Really, there are only 3 or 4 with any substantial traffic, and there are good reasons to pick one over the others, and they are the same good reasons for them to be separate.
There is a cross post feature, and the resuting post appears to be aware it was cross posted - it would be nice if Lemmy would consolidate those to one post that appears in multiple communities, or at least show you only one of them.
AI content isn’t watermarked, or detection would be trivial. What he’s talking about is that certain words have a certain probability of appearing after certain other words in a certain context. While there is some randomness to the output, certain words or phrases are unlikely to appear because the data the model was based on didn’t use them.
All I’m saying is that the more a writer’s writing style and word choice are similar to the data set, the more likely their original content would be flagged as AI generated.
Here’s the thing though - the probabilities for word choice come from the data the model was trained on. While someone that uses a substantially different writing style / word choice than the LLM could easily be identified as being not from the LLM, someone with a similar writing style might be indistinguishable from the LLM.
Or, to oversimplify: given that Reddit was a large portion of the input data for ChatGPT, all you need to do is write like a Redditor to sound like ChatGPT.
If it could, it couldn’t claim that the content out produced was original. If AI generated content were detectable, that would be a tacit admission that it is entirely plagiarized.
The base assumption of those with that argument is that an AI is incapable of being original, so it is “stealing” anything it is trained on. The problem with that logic is that’s exactly how humans work - everything they say or do is derivative from their experiences. We combine pieces of information from different sources, and connect them in a way that is original - at least from our perspective. And not surprisingly, that’s what we’ve programmed AI to do.
Yes, AI can produce copyright violations. They should be programmed not to. They should cite their sources when appropriate. AI needs to “learn” the same lessons we learned about not copy-pasting Wikipedia into a term paper.
Though, ironically a scale of Full - 3/4 - half - 1/4 - empty is perfectly fine for gas. There is usually a visual gauge of % for charge, but it isn’t as prominent as the range. Oddly, my car has it divided roughly in thirds.
The problem is that other vehicles adjust the projection based on current conditions - when I drive up a mountain, my projected range drops like a rock. When I drive back down I can end up with more range than I started. Reporting the “ideal” case during operation is misleading at best.
Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.
That is very different than saying that you can’t feed legally acquired content into an AI.
No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.
My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.
Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.
My objection is strictly on the input side, and the output is already restricted.
Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.
My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).
Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.
What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
The fediverse is the name for services that use ActivityPub - a communication protocol. What you are saying is like saying “tech companies, banks and regulators need to crack down on http because there is CSAM on the web”.
Nature knows how to solve this problem.