667

Dropbox removed ability to opt your files out of AI training (news.ycombinator.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

101 comments fedilink hide all child comments

Dropbox removed ability to opt your files out of AI training::undefined

you are viewing a single comment's thread
view the rest of the comments

[-] reksas@sopuli.xyz 18 points 2 years ago

Time for dropbox users to upload all kinds of crap for ai to "learn" from, all within tos of course.

I bet there are many kinds of ways to make your files poison the ai learning data. Its going to be fun for those ai guys to sort which files are probably safe and which are not. I think even if ONE user manages to slip something that corrupts the training data and its not noticed soon enough it might cause problems for them. Though someone who actually knows something about the subject might want to tell if i'm talking shit or not.

I'm not against ai in general, but if its trained with data that was obtained from unwilling people, like this, then its makers can fuck off.

[-] JonEFive@midwest.social 3 points 2 years ago

It really depends on what the AI training is looking for. You can potentially poison an AI training model, but you'll likely have to add enough data to be statistically relevant.

[-] reksas@sopuli.xyz 1 points 2 years ago

enough data as in many different people will have to upload one or two files that contain such data or you have to upload very large file that contains a lot of data that causes problems?

[-] JonEFive@midwest.social 2 points 2 years ago

It's honestly difficult for me to say because there are so many different ways to train AI. It really depends more on what the trainers configure to be a data point. Volume of files vs size of a single file aren't as important as what the AI believes is a data point and how the data points are weighted.

Just as a simple example, a data point may be considered a row on a spreadsheet without regard for how that data was split up across files. So ten files with 5 rows each might have the same weight as one file with 50 rows. But there's also a penalty concept in some models, so the trainer can set it so that data that all comes from one file may be penalized. Or the opposite could be true if data coming from the same file is deemed to be more important in some way.

In terms of how AIs make their decisions, that can also vary. But generally speaking, if 1000 pieces of data are used that are all similar in some way and one of them is somewhat different from the others, it is less likely that that one-off data will be used. It's much more likely to have an effect If 100 of the 1000 pieces of data have that same information. There's always the possibility of using that 1/1000 data, it's just less likely to have a noticeable effect.

AIs build confidence in responses based on how much a concept is reinforced, so you'd have to know something about the training algorithm to be able to intentionally impact the results.

[-] reksas@sopuli.xyz 0 points 2 years ago

thank you, this was the kind of information i was hoping for

this post was submitted on 19 Dec 2023

667 points (97.6% liked)

Technology

77517 readers

1315 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws