Could the reddit API changes have to do with ChatGPT rather than third party apps?

gotofritz@beehaw.org · edit-2 1 year ago

Could the reddit API changes have to do with ChatGPT rather than third party apps?

iMeddles@fedia.io · 1 year ago

Charging for their api is reasonable in answer to the llm data scrapers. The amount they’re chsrging, and the speed of the changes is not reasonable however IMO.

JohnDClay@sh.itjust.works · 1 year ago

The original announcement said they were making exceptions for applications that gave back to Reddit. I and many others hoped that was basically everyone who wasn’t AI scraping. But seems like they got greedy while they were at it and decided to kill everything

whofearsthenight@beehaw.org · 1 year ago

Could they have something to do with it? Yes, for sure. But the thing is that they didn’t have to do any of this the way they did. They could have made an API plan that allowed third party apps to still exist/thrive, and also charge big companies that just want to use reddit to train LLM’s. Change the pricing/terms based around this idea. They deliberately went after third party apps, and then double and tripled down on it in the face of massive backlash. If spez was competent, he would have been able to better pivot this conversation and make it about training LLM’s for megacorps, but he didn’t and even then it would have still been bullshit that is easily seen past.

spoonful@beehaw.org · 1 year ago

Reddit data is public and can be easily web scraped. Reddit doesn’t own it. Spez is just throwing random memes in to distract people.

gotofritz@beehaw.org · 1 year ago

I am sorry but you don’t know what you are talking about. These things are regulated by legal documents, you don’t just wake up on morning and say “trust me bro, their data is public”

If you go and read their TnC’s it explicitly statea that scraping is forbidden without prioir written consent. They only allow access to their data via APIs, which of course they charge for

The fact that it can be easily scraped it’s neither here nor there, if they catch you they can sue you

spoonful@beehaw.org · edit-2 1 year ago

Nah Terms of Service is not enforcable through browse wrap agreement in the US and most of EU. You can’t implicitly agree with a legal document just by looking at something.

Check out LinkedIn v. Hiq case which went to 9th circuit and set the precedent for this. LinkedIn lost.

deegeese@sopuli.xyz · edit-2 1 year ago

99% of LLMs have pirated content and will continue to regurgitate pirated content until there is enough money at stake for a big lawsuit.

gotofritz@beehaw.org · 1 year ago

Getty is already suing the Dall-E creators, and someone is suing MS for Copilot; so it’s already started

deegeese@sopuli.xyz · 1 year ago

Again, big money users will get sued, everyone else will scrape with impunity.

gotofritz@beehaw.org · edit-2 1 year ago

Sure but I’m not sure why you are bringing this up. What’s the wider point you are trying to make?

spoonful@beehaw.org · 1 year ago

I’m still perplexed that some people are siding with evil ass Getty in that case. At least the copilot case has some merit but I don’t see how Microsoft could lose as that would set precedent for whole AI in the US and no way US is letting that disadvantage to happen. It’s meme-level lawsuits.

gotofritz@beehaw.org · 1 year ago

Just speculation, but I think it’s because people think Getty can hire top class laywers and therefore has a better chance of winning compared to, say, the group of artists who were also taking Dall-E to court

Fubarberry@aiparadise.moe · 1 year ago

Unless I’m mistaken and something is different, this hasn’t been a problem for tools like newpipe, YouTube vanced, and fritter.

damn@lemmy.fmhy.ml · 1 year ago

Why not both? I think they see this as an opportunity to kill two birds with one stone.

j4k3@lemmy.world · 1 year ago

The value of LLM’s has changed drastically in favor of open source since the Meta weights leak. The proprietary model looks pretty much wrecked now, at least as far as I understand the leaked internal memo from a google researcher last month.

https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

MarPan@lemmy.world · 1 year ago

This is a fascinating read, thank you very much for sharing.

gotofritz@beehaw.org · 1 year ago

Oh I’m not saying they are doing the right thing or that it was the correct decision. Just speculating whether LLMs is what kicked off the whole thing

j4k3@lemmy.world · 1 year ago

I’m saying the premise that LLM’s have anything to do with it is either incompetent failure to keep up with LLM developments, or a pack of lies.

gotofritz@beehaw.org · edit-2 1 year ago

I disagree, it’s still too early abd a bit presumptuous to make such conclusive statements

rubythulhu@beehaw.org · 1 year ago

Yup. AI consumers are more profitable than 3rd party apps. why focus on tiered pricing when you can just name a price point everyone has to pay that only huge AI companies are willing to.

Reddit gets their content for free. Reselling it at a high price to AI/ML consumers is an easy way to turn free content into profit with almost no effort.

Senseibull@lemmy.ml · 1 year ago

It is, but reddit don’t own the content on their site according to their TOS, posters merely grant them a license to redistribute it. So it’s not really their call to shut off ChatGPT scraping, it should be a community decision

gotofritz@beehaw.org · 1 year ago

“Merely” - the TOS basically grant Reddit the ability to do what the hell they want with it, LOL

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit.

And furthermore

You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

z2k_@lemmy.nz · 1 year ago

Yes but imo it would be easy to seperate LLM and 3rd party apps since 3rd party apps have users sign in independently. They chose to also target 3rd party apps and take them down.

CookieJarObserver@feddit.de · 1 year ago

Training data gets gathered with scrapers

gotofritz@beehaw.org · 1 year ago

IF the owners of the data agree, or, if they disagree, until they take you to court. Getty Images are taking the creators of Dall-E to court, an some tech company is taking MS to court for Copilot

CookieJarObserver@feddit.de · 1 year ago

No, law says that if its not supposed to be used for training data it has to be Mashine readable that its not supposed to be used for that. And for scientific purposes its basically irrelevant. You can take to court whoever you want, that doesn’t change stuff.

Wintermute@lemmy.villa-straylight.social · 1 year ago

What “law” says that? That’s not how copyright works at all. If you don’t have an explicit license to use content you don’t own, you can’t legally use it.

CookieJarObserver@feddit.de · 1 year ago

https://www.gesetze-im-internet.de/urhg/__44b.html

German law and that’s where many of the data mining companys are located.

Wintermute@lemmy.villa-straylight.social · 1 year ago

Is there an English translation available? That’s a hell of a departure from international copyright agreements that I wasn’t aware of if it’s true.

CookieJarObserver@feddit.de · edit-2 1 year ago

Act on Copyright and Related Rights (Copyright Act) § 44b Text and Data Mining (1) Text and data mining is the automated analysis of single or multiple digital or digitized works in order to extract information from them, in particular about patterns, trends and correlations. (2) Reproductions of legally accessible works for text and data mining are permitted. The reproductions shall be deleted when they are no longer required for text and data mining. (3) Uses according to paragraph 2 sentence 1 are only permitted if the right holder has not reserved them. A reservation of use in the case of works accessible online shall only be effective if it is made in machine-readable form.

There is no official englisch Translation but DeepL does a good job to my knowledge. If you have further questions just ask, German law is very complicated and very depended on interpretation, its sometimes just barely understandable even for our lawyers…

gotofritz@beehaw.org · 1 year ago

Interesting. Do you have a link to the specifics of the law you are talking about?

Hyperz@beehaw.org · 1 year ago

And lots of proxies.

CookieJarObserver@feddit.de · 1 year ago

Yeah that as well.

UntouchedWagons@lemmy.ca · 1 year ago

At least seven proxies.

Kris@lemmy.world · 1 year ago

Yes but nothings stopping scraping of reddit content from the front end

gotofritz@beehaw.org · 1 year ago

Technically not (well, they can make it harder), but they can sue them for doing it

jpv@beehaw.org · 1 year ago

Sure, but they could do the same thing with an API. Make scraping for LLMs against the TOS; not personal use. I really do think (as the OP says) it’s two birds with one stone.

SkyNTP@lemmy.ml · edit-2 1 year ago

Reddit’s business model was not founded on selling LLM data. Reddit got greedy and decided to change their business model to cash in on an unexpected revenue stream. What was also unexpected (to Reddit) is that you cannot cater to reddit-style social media communities and monetize their data for LLM training effectively at the same time. And now Reddit will have neither, and will die just like all other businesses that adopt Enshitification as a core operating procedure.

Let this be a lesson to them and all that follow: do not let your greed make you blind to the consequences of your actions.

gotofritz@beehaw.org · 1 year ago

Does it matter what Reddit’s business model was founded on? Businesses respond to changing conditions all the time and pivot.

“they got greedy” seems really a naive way of looking at it. They are a business, that’s what businesses are all about. Additionally, they are a busienss which is NOT profitable, and need to to change things to survive now that the era of low interest rates has come to end. The real issue is that they are so inept IMHO

I find the word “entshittification” so cringe

EvilColeslaw@beehaw.org · edit-2 1 year ago

I think this is the main reason for the insane prices, but it could have easily been avoided. They don’t need to have one price class for every type of use of their Data API. They could have easily had one rate for LLM and other AI training uses and another for third party client applications. I feel like at some point they realized they’d rather just kill the third parties while they’re at it and this seemed like the logical moment.

gotofritz@beehaw.org · 1 year ago

Yeah, one of the other answers to the AMA was “we are not profitable yet, unlike the 3rd part app devs…” - that is something that wouldn’t sit well with any investor I know

IggyTheSmidge@lemmy.blahaj.zone · 1 year ago

I think that was definitely the impetus - I first read about the changes in this article back in April: https://www.theregister.com/2023/04/18/reddit_charging_ai_api/

The closing statement is interesting:

The spokesperson we talked to also wanted to make clear the Data API was still freely accessible for appropriate use cases through the Reddit developer platform; hopefully app developers and other small-scale operators won’t have any surprises ahead this summer.

I suspect they ran the numbers and started seeing dollar signs - they don’t care about the third-party apps (which don’t make them any money directly), they’re just trying to cash in on Microsoft etc.

I have a sneaking suspicion they’re going to end up back-pedalling, but it will be too little, too late.

Schelleberg@feddit.de · 1 year ago

I’m very sure that this is the case. Reddit is pissed they gave away all the content as training data for free while struggling to monetize their platform adequately.

But I suspect the damage is already done. There are projects like “Orca” from Microsoft that skip the learning process from source data for a big part by using chatGPT and GPT4.

They missed the timing but are too stubborn and double down on it

nob0dy@beehaw.org · 1 year ago

They could have created better licensing models. It does rely on people honoring the agreements but besides countries that disregard IPs I think its a viable model. Their business is social media, not curating datasets.

gotofritz@beehaw.org · 1 year ago

They could have, probably / maybe, but they are quite inept. What is social media if not a giant dataset?!?

Could the reddit API changes have to do with ChatGPT rather than third party apps?

Could the reddit API changes have to do with ChatGPT rather than third party apps?

Addressing the community about changes to our API