98

Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

you are viewing a single comment's thread
view the rest of the comments
[-] TankieTanuki@hexbear.net 20 points 3 days ago* (last edited 3 days ago)

I've been using Whisper with TankieTube and I'm curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.

The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2.

Snake eating its own tail, etc.

In your experience, has whisper large c3 been much worse than vo2?

[-] TankieTanuki@hexbear.net 6 points 3 days ago* (last edited 2 days ago)

I haven't done any comparing; I just went with the apparent consensus, which is that v2 was more accurate and hallucinated less.

In your experience, has whisper large c3 been much worse than vo2?

this post was submitted on 29 Oct 2024
98 points (100.0% liked)

technology

23239 readers
310 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS