230

AI models fed AI-generated data quickly spew nonsense (www.nature.com)

submitted 2 years ago by ArcticDagger@feddit.dk to c/science@lemmy.world

51 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Theharpyeagle@lemmy.world 3 points 2 years ago* (last edited 2 years ago)

Part of the problem is that we have relatively little insight into or control over what the machine has actually "learned". Once it has learned itself into a dead end with bad data, you can't correct it, only work around it. Your only real shot at a better model is to start over.

When the first models were created, we had a whole internet of "pure" training data made by humans and developers could basically blindly firehose all that content into a model. Additional tuning could be done by seeing what responses humans tended to reject or accept, and what language they used to refine their results. The latter still works, and better heuristics (the criteria that grades the quality of AI output) can be developed, but with how much AI content is out there, they will never have a better training set than what they started with. The whole of the internet now contains the result of every dead end AI has worked itself into with no way to determine what is AI generated on a large scale.

this post was submitted on 26 Jul 2024

230 points (96.7% liked)

science

27054 readers

42 users here now

A community to post scientific articles, news, and civil discussion.

dart board;; science bs

rule #1: be kind

lemmy.world rules

founded 2 years ago

MODERATORS

m3t00@lemmy.world

Joleee@lemmy.world

laverabe@lemmy.world

DeadPand@midwest.social

laverabe@lemmy.zip