98
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 29 Oct 2024
98 points (100.0% liked)
technology
23239 readers
310 users here now
On the road to fully automated luxury gay space communism.
Spreading Linux propaganda since 2020
- Ways to run Microsoft/Adobe and more on Linux
- The Ultimate FOSS Guide For Android
- Great libre software on Windows
- Hey you, the lib still using Chrome. Read this post!
Rules:
- 1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
- 2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
- 3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
- 4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
- 5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
- 6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
- 7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.
founded 4 years ago
MODERATORS
as much as the speech-to-text gets wrong on my phone, I can only imagine what it does with doctors' notes.
one of my million previous jobs was in medical transcription, and it is so easy to misunderstand things even when you have a good grasp of specialty-specific terminology and basic anatomy.
they enunciate the shit they're recording about your case about as well as they legibly write. you really have to get a feel for a doctor's speaking style and common phrases to not turn in a bunch of errors.
Edit: oh yeah, ✨ innovation ✨
Edit 2: it gets better and better
Edit 3: wonder if the Organ Procurement Organizations are going to try to use this to blame for the extremely fucked up shit that's been happening
I've been using Whisper with TankieTube and I'm curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.
Snake eating its own tail, etc.
In your experience, has whisper large c3 been much worse than vo2?
I haven't done any comparing; I just went with the apparent consensus, which is that v2 was more accurate and hallucinated less.
In your experience, has whisper large c3 been much worse than vo2?
How can a transcription tool be so bad? YouTube doesn't get things this wrong.
Probably audio quality. I can't imagine the acoustics in a hospital room or the hallway outside are anything close to most YouTube videos being recorded with a professional mic
sometimes they go into a tiny little office so they can concentrate better, and it's so much easier to hear those docs
Who did they train it on, Trump, Biden, or any other of the geriatric ghouls in DC?