232

Stanford researchers find Mastodon has a massive child abuse material problem (www.theverge.com)

submitted 2 years ago by trashhalo@beehaw.org to c/technology@beehaw.org

124 comments fedilink hide all child comments

Mastodon, an alternative social network to Twitter, has a serious problem with child sexual abuse material according to researchers from Stanford University. In just two days, researchers found over 100 instances of known CSAM across over 325,000 posts on Mastodon. The researchers found hundreds of posts containing CSAM related hashtags and links pointing to CSAM trading and grooming of minors. One Mastodon server was even taken down for a period of time due to CSAM being posted. The researchers suggest that decentralized networks like Mastodon need to implement more robust moderation tools and reporting mechanisms to address the prevalence of CSAM.

you are viewing a single comment's thread
view the rest of the comments

[-] pineapplelover@infosec.pub 12 points 2 years ago

One way to do this is to block hashes. This is a slippery slope though because it could be used maliciously. Only way to do this and protect freedom of information is to make this fully open source.

[-] scrubbles@poptalk.scrubbles.tech 8 points 2 years ago

Block hash lists then? Something like a community driven hashlist for CSAM would work, of the majority of federated instances report it as that type then it would get added to the list. Instances could then choose what lists they wanted to block.

...instances could also show what lists they subscribe to so they users could see what sort of moderation they choose

[-] glorbo@lemmy.one 7 points 2 years ago

So the standard approach to this is so-called "perceptual hashing." Effectively, using cryptographic hashes (sha256, etc.) doesn't really work well in this case. Given a piece of illegal content, that content is likely to still be just as illegal with a single pixel changed -- however, it'll have a completely different cryptographic hash. So instead, a hash function that determines how "similar-looking" two images are, ignoring things like dimensions, color palette, JPEG compression artifacts, etc. This is obviously way fuzzier, and is prone to both false positives and negatives.

Because all this is inherently kinda fuzzy, the exact database of hashes is usually "secret sauce" if you will. If it were public, it would be super easy to circumvent. As an example, given an illegal image:

Is the image's hash in the DB?
No? All done, you can post it with impunity.
Yes? Change one random pixel, GOTO 1.

As a result even "public" databases are distributed with NDAs etc. This obviously does not jive well with an open source, federated network like Mastodon, and I have my doubts as to how willing the relevant agencies would be to give their databases to every rando with $5 to spin up a Pleroma instance on a VPS. A public DB might help in some cases, but unfortunately more illegal content is produced every day, and so it would be extremely hard to keep up with the bad actors.

[-] BarbecueCowboy@kbin.social 6 points 2 years ago

This is kind of problematic... By creating a community driven hashlist that is freely shared, you've also kind of created an index of CSAM content that could easily be extrapolated for people actively looking to find/share that content.

[-] IronKrill@lemmy.ca 4 points 2 years ago

Surely a list of hashes wouldn't be that useful?

[-] sociablefish@lemm.ee 3 points 2 years ago* (last edited 2 years ago)

only if they are crypto hashes (hash functions that back btc, ltc, other cryptos) as they are irreversible*

*i wont explain, use your internet in the pocket

[-] BarbecueCowboy@kbin.social 2 points 2 years ago* (last edited 2 years ago)

Super useful, it's very similar to how magnet links for torrenting works. I know of a few less popular file sharing services that can act and search for files based on hash alone.

A lot of other areas online make use of hashes as identifiers already too. If you search for a hash of a file you've downloaded, just the hash and nothing else, there's a very good chance you'll get multiple results.

[-] Emperor@feddit.uk 3 points 2 years ago

Doesn't anyone looking for that material already know what to look for?

[-] IronKrill@lemmy.ca 2 points 2 years ago

Image hashes? That could work. It could be a simple system like uBlock where you import filter lists to your instance and they're easy to disable if their caretakers fill them with garbage data.

this post was submitted on 24 Jul 2023

232 points (100.0% liked)

Technology

42437 readers

322 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org