Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said (apnews.com)

submitted 6 months ago by Yuritopiaposadism@hexbear.net to c/technology@hexbear.net

19 comments fedilink hide all child comments

Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

all 23 comments

sorted by: hot top controversial new old

[-] dustbunnies@hexbear.net 28 points 6 months ago* (last edited 6 months ago)

as much as the speech-to-text gets wrong on my phone, I can only imagine what it does with doctors' notes.

one of my million previous jobs was in medical transcription, and it is so easy to misunderstand things even when you have a good grasp of specialty-specific terminology and basic anatomy.

they enunciate the shit they're recording about your case about as well as they legibly write. you really have to get a feel for a doctor's speaking style and common phrases to not turn in a bunch of errors.

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

internet-delenda-est

Edit: oh yeah, ✨ innovation ✨

While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper.

Edit 2: it gets better and better

In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

But the transcription software added: “He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding “two other girls and one lady, um, which were Black.”

In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

Edit 3: wonder if the Organ Procurement Organizations are going to try to use this to blame for the extremely fucked up shit that's been happening

[-] TankieTanuki@hexbear.net 21 points 6 months ago* (last edited 6 months ago)

I've been using Whisper with TankieTube and I'm curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.

The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2.

Snake eating its own tail, etc.

[-] gay_king_prince_charles@hexbear.net 7 points 6 months ago

In your experience, has whisper large c3 been much worse than vo2?

[-] TankieTanuki@hexbear.net 6 points 6 months ago* (last edited 6 months ago)

I haven't done any comparing; I just went with the apparent consensus, which is that v2 was more accurate and hallucinated less.

[-] gay_king_prince_charles@hexbear.net 3 points 6 months ago

In your experience, has whisper large c3 been much worse than vo2?

[-] blobjim@hexbear.net 11 points 6 months ago

How can a transcription tool be so bad? YouTube doesn't get things this wrong.

[-] SadArtemis@hexbear.net 11 points 6 months ago

“He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

“two other girls and one lady, um, which were Black.”

Who did they train it on, Trump, Biden, or any other of the geriatric ghouls in DC?

[-] SacredExcrement@hexbear.net 24 points 6 months ago

Seems a bit stupid to use a transcription aid that can literally invent things, but when has something completely failing to do what it is supposed to do stopped capitalists from saving a buck

[-] plinky@hexbear.net 20 points 6 months ago

seems suboptimal, but i'm excited about the future of ai in the medical industry, specifically rich people care ancap-good

[-] stigsbandit34z@hexbear.net 17 points 6 months ago

a-guy

[-] WhatDoYouMeanPodcast@hexbear.net 17 points 6 months ago

I managed to discover that with unassisted casual use really quickly. People are asleep at the wheel if they tried to give important duties to an AI. You don't let a dog drive your car and hope for the best

[-] BeamBrain@hexbear.net 16 points 6 months ago

surprised-pika

[-] vegeta1@hexbear.net 12 points 6 months ago* (last edited 6 months ago)

This is a fucking stupid tool. Invent things on the fly in the medical field? Terminate this shit with extreme prejudice

[-] blame@hexbear.net 9 points 6 months ago

i guess we're not doing hippa anymore huh

[-] UmbraVivi@hexbear.net 8 points 6 months ago* (last edited 6 months ago)

At least crypto wasn't this annoying. You could just point and laugh from the outside. AI is being shoved into everything and makes anything it touches significantly worse.

[-] fubarx@lemmy.ml 6 points 6 months ago

That explains why the proctologist kept insisting I needed breast augmentation surgery.

[-] BabaIsPissed@hexbear.net 6 points 6 months ago

This is fucked, you don't use a black box approach in anything high risk without human supervision. Whisper probably could be used to help accelerate a transcriptions done by an expert, maybe some sort of "first pass" that needs to be validated, but even then it might not help speed things up and might impact quality (see coding with copilot). Maybe also use the timestamp information for some filtering of the most egregious hallucinations, or a bespoke fine-tuning setup (assuming it was fine-tuned it the first place)? Just spitballing here, I should probably read the paper to see what the common error cases are.

It's funny, because this is the openAI model I had the least cynicism towards, did they bazinga it up when I wasn't looking?

this post was submitted on 29 Oct 2024

99 points (100.0% liked)

technology

23719 readers

77 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net