98

Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

you are viewing a single comment's thread
view the rest of the comments
[-] dustbunnies@hexbear.net 27 points 3 days ago* (last edited 3 days ago)

as much as the speech-to-text gets wrong on my phone, I can only imagine what it does with doctors' notes.

one of my million previous jobs was in medical transcription, and it is so easy to misunderstand things even when you have a good grasp of specialty-specific terminology and basic anatomy.

they enunciate the shit they're recording about your case about as well as they legibly write. you really have to get a feel for a doctor's speaking style and common phrases to not turn in a bunch of errors.

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

internet-delenda-est

Edit: oh yeah, ✨ innovation ✨

While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper.

Edit 2: it gets better and better

In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

But the transcription software added: “He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding “two other girls and one lady, um, which were Black.”

In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

Edit 3: wonder if the Organ Procurement Organizations are going to try to use this to blame for the extremely fucked up shit that's been happening

[-] TankieTanuki@hexbear.net 20 points 3 days ago* (last edited 3 days ago)

I've been using Whisper with TankieTube and I'm curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.

The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2.

Snake eating its own tail, etc.

In your experience, has whisper large c3 been much worse than vo2?

[-] TankieTanuki@hexbear.net 6 points 3 days ago* (last edited 3 days ago)

I haven't done any comparing; I just went with the apparent consensus, which is that v2 was more accurate and hallucinated less.

In your experience, has whisper large c3 been much worse than vo2?

[-] blobjim@hexbear.net 11 points 3 days ago

How can a transcription tool be so bad? YouTube doesn't get things this wrong.

[-] Sulvor@hexbear.net 9 points 3 days ago* (last edited 3 days ago)

Probably audio quality. I can't imagine the acoustics in a hospital room or the hallway outside are anything close to most YouTube videos being recorded with a professional mic

[-] dustbunnies@hexbear.net 3 points 2 days ago

sometimes they go into a tiny little office so they can concentrate better, and it's so much easier to hear those docs

[-] SadArtemis@hexbear.net 10 points 3 days ago

“He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

“two other girls and one lady, um, which were Black.”

Who did they train it on, Trump, Biden, or any other of the geriatric ghouls in DC?

this post was submitted on 29 Oct 2024
98 points (100.0% liked)

technology

23239 readers
310 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS