Proof that bots are manipulating content

HTTP_404_NotFound@lemmyonline.com · edit-2 2 years ago

Proof that bots are manipulating content

𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.one · 2 years ago

This is troubling.

At least we have the data though, hopefully these findings are useful for updating the Fediseer/Overseer so we can more easily detect bots

HTTP_404_NotFound@lemmyonline.com · 2 years ago

I really wish we would have a good data scientist, or ML individual jump in this thread.

I can easily dig through data, I can easily dig through code- but, someone who could perform intelligent anomaly detection would be a god-send right now.

monobot@lemmy.ml · 2 years ago

There are data scientist around and we are monitoring where this goes.

Bigest problem I currently see is how to effectively share data but preserve privacy. Can this be solved without sharing emails and ip addresses or would that be necessary? Maybe securely hashing emails and ip addresses is enough, but that would hide some important data.

Should that be shared only with trusted users?

Can we create dataset where humans would identify bots and than share with larger community (like kaggle), to help us with ideas.

There are options and will be built, just jt can not happen in few days. People are working non stop to fix (currently) more important issues.

Be patient, collect the data and let’s work on solution.

And let’s be nice to each others, we all have similar goals here.

HTTP_404_NotFound@lemmyonline.com · 2 years ago

Biggest problem I currently see is how to effectively share data but preserve privacy. Can this be solved without sharing emails and ip addresses or would that be necessary? Maybe securely hashing emails and ip addresses is enough, but that would hide some important data.

So- email addresses and instances are actually only known by the instance hosting the user. That data is not even included in the persons table. Its stored in the local_user table, away from the data in question. As such- it wouldn’t be needed, nor, included in the dataset.

Regarding privacy- that actually isn’t a problem. On lemmy, EVERYTHING is shared with all federated instances. Votes, Comments, Posts. Etc. As such- there isn’t anything I can share from my data, that already isn’t also known by many other individuals.

Can we create dataset where humans would identify bots and than share with larger community (like kaggle), to help us with ideas.

Absolutely. We can even completely automate the process of aggregating and displaying this data.

db0 also had an idea posted in this thread- and is working on a project to help humans vet out instances. I think that might be a start too.

monobot@lemmy.ml · edit-2 2 years ago

That sounds great and at least we can try something and learn what can or can not be done. I am totally interested in working on bot detection.

I know that emails remain locally, but those can also be important part of pattern detection, but it has to be done without them.

Fediseer sounds great, at least building some in instances.

I am more thinking on votes, comments and post detection from individual accounts in which fediseer would be quite important weight.

HTTP_404_NotFound@lemmyonline.com · 2 years ago

The best approach might be to work on a service intended to run locally besides lemmy-

That way- data privacy isn’t a huge concern, since the data never leaves the local server/network.

Proof that bots are manipulating content

Proof that bots are manipulating content

Important Note

The REAL problem

What can happen if we don’t identify a solution.

Edits