80
submitted 2 days ago by RedWizard@hexbear.net to c/news@hexbear.net
you are viewing a single comment's thread
view the rest of the comments
[-] Mardoniush@hexbear.net 57 points 2 days ago

"We had been working with their AI tool for a while, and it was absolutely not at the point of being capable of writing lessons without humans.”

Lol, LMAO this is gonna be a disaster once the bubble crashes and current models undergo recursive degradation.

[-] Antiwork@hexbear.net 39 points 2 days ago

I do think LLMs are going to start getting worse once more data is fed into them. And then instead of admitting this. These companies will have so much capital they will just tune the AI to say exactly what they want it to every time. We already see some of this but it will get worse.

[-] Gucci_Minh@hexbear.net 15 points 2 days ago

Reject LLMs branded as AI, retvrn to 9999999 nested if statements.

[-] hellinkilla@hexbear.net 8 points 2 days ago

LLMs are going to start getting worse once more data is fed into them

I've been hearing this for a while, has it started happening yet?

And if it did, is there any reason why people couldn't switch back to an older version?

[-] semioticbreakdown@hexbear.net 17 points 2 days ago* (last edited 2 days ago)

I knew the answer was "Yes" but it took me a fuckin while to find the actual sources again

https://arxiv.org/pdf/2307.01850 https://www.nature.com/articles/s41586-024-07566-y

the term is "Model collapse" or "model autophagy disorder" and any generative model is susceptible to it

as to why it has not happened too much yet: Curated datasets of human generated content with minimal AI content If it does: You could switch to an older version, yes, but to train new models with any new information past a certain point you would need to update the dataset while (ideally) introducing as little AI content as possible, which I think is becoming intractable with the widespread deployment of generative models.

[-] semioticbreakdown@hexbear.net 8 points 2 days ago

The witting or unwitting use of synthetic data to train generative models departs from standard AI training practice in one important respect: repeating this process for generation after generation of models forms an autophagous (“self-consuming”) loop. As Figure 3 details, different autophagous loop variations arise depending on how existing real and synthetic data are combined into future training sets. Additional variations arise depending on how the synthetic data is generated. For instance, practitioners or algorithms will often introduce a sampling bias by manually “cherry picking” synthesized data to trade off perceptual quality (i.e., the images/texts “look/sound good”) vs. diversity (i.e., many different “types” of images/texts are generated). The informal concepts of quality and diversity are closely related to the statistical metrics of precision and recall, respectively [39 ]. If synthetic data, biased or not, is already in our training datasets today, then autophagous loops are all but inevitable in the future.

[-] Mardoniush@hexbear.net 7 points 2 days ago

Sometimes yeah you can see it. Not only with updates but within a conversation, Models degrade in effectiveness long before context window is reached. Things like image generation tend to get worse after >2 edits and even if the image seed is given .

this post was submitted on 02 May 2025
80 points (98.8% liked)

news

24010 readers
776 users here now

Welcome to c/news! Please read the Hexbear Code of Conduct and remember... we're all comrades here.

Rules:

-- PLEASE KEEP POST TITLES INFORMATIVE --

-- Overly editorialized titles, particularly if they link to opinion pieces, may get your post removed. --

-- All posts must include a link to their source. Screenshots are fine IF you include the link in the post body. --

-- If you are citing a twitter post as news please include not just the twitter.com in your links but also nitter.net (or another Nitter instance). There is also a Firefox extension that can redirect Twitter links to a Nitter instance: https://addons.mozilla.org/en-US/firefox/addon/libredirect/ or archive them as you would any other reactionary source using e.g. https://archive.today . Twitter screenshots still need to be sourced or they will be removed --

-- Mass tagging comm moderators across multiple posts like a broken markov chain bot will result in a comm ban--

-- Repeated consecutive posting of reactionary sources, fake news, misleading / outdated news, false alarms over ghoul deaths, and/or shitposts will result in a comm ban.--

-- Neglecting to use content warnings or NSFW when dealing with disturbing content will be removed until in compliance. Users who are consecutively reported due to failing to use content warnings or NSFW tags when commenting on or posting disturbing content will result in the user being banned. --

-- Using April 1st as an excuse to post fake headlines, like the resurrection of Kissinger while he is still fortunately dead, will result in the poster being thrown in the gamer gulag and be sentenced to play and beat trashy mobile games like 'Raid: Shadow Legends' in order to be rehabilitated back into general society. --

founded 4 years ago
MODERATORS