Coming soon – offline speech recognition on your phone (news.ku.dk)

submitted 1 week ago by tardigrada@beehaw.org to c/technology@beehaw.org

19 comments fedilink hide all child comments

More than one in four people currently integrate speech recognition into their daily lives. A new algorithm developed by a University of Copenhagen researcher and his international colleagues makes it possible to interact with digital assistants like “Siri” without any internet connection. The innovation allows for speech recognition to be used anywhere, even in situations where security is paramount.

[...]

Until now, speech recognition has relied upon a device being connected to the internet. This is because the algorithms typically used for this process require significant amounts of temporary random access memory (RAM) which is usually provided by powerful data center servers. Indeed, try switching your smartphone to airplane mode and see how far your voice commands get you. But change is in the air.

A new algorithm developed by Professor Panagiotis Karras from the University of Copenhagen’s Department of Computer Science, together with linguist Nassos Katsamanis of the Athena Research Center in Greece, and researchers from Aalto University in Finland and KTH in Sweden, allows even smaller devices like smartphones to decode speech without needing substantial memory—or internet access.

The code, recently presented in a scientific article, employs a clever strategy: it "forgets" what it doesn’t need in real-time.

[...]

This maneuver may sound simple, but it involves an entirely new and unique code for which the researchers have sought a patent. This algorithm reduces the need for critical memory without sacrificing recognition quality. And though it requires slightly more time and computational power, the researchers assure that the difference is negligible vis-à-vis the muscular capabilities of modern devices.

Moreover, it works without an internet connection, thus enabling speech recognition—and potentially real-time language translation in the future, hope the researchers—anywhere, even in the depths of the Amazon jungle.

[...]

top 19 comments

sorted by: hot top controversial new old

[-] Wistful@discuss.tchncs.de 20 points 1 week ago

What does FUTO use? It works pretty good (based on my limited testing) and it works offline.

[-] hendrik@palaver.p3x.de 6 points 1 week ago* (last edited 1 week ago)

I think that's based on OpenAI's Whisper model. (Which seems to be the defacto standard these days.)

[-] Gormadt@lemmy.blahaj.zone 3 points 1 week ago

I've only had issues on days when I swap my aligners but even my friends have a hard time those days lol

10/10 highly recommended

I also dig their keyboard, I just wish it supported like searching for gifs to put directly into messenger apps.

9/10

[-] Steve@communick.news 3 points 1 week ago

That's what I was thinking.
I'm pretty sure FUTO isn't the only one either.
This doesn't seem like new tech.

[-] Markaos@discuss.tchncs.de 2 points 1 week ago

Yeah, stock Google voice recognition also works offline if you download the language model beforehand.

[-] dan@upvote.au 13 points 1 week ago

I think the Home Assistant community has been working on offline speech recognition too, as a fully open replacement to things like Google Assistant.

[-] fmstrat@lemmy.nowsci.com 2 points 1 week ago

Pretty sure they use Whisper, which is what FUTO Keyboard already uses on Android to keep it local to the phone.

I use Heliboard as a keyboard, then FUTO Voice connected to the mic button.

[-] 01189998819991197253@infosec.pub 9 points 1 week ago

This article may have been right 2 years ago, but not so much today.

I have an offline stt keyboard on my phone that uses Vosk. I used to have a stt digital assistant, too (can't remember which model), but I didn't need a "siri" and ended up uninstalling.

[-] kbal@fedia.io 5 points 1 week ago

I wonder how many of those one in four people are even aware that everything they say gets uploaded to a data centre somewhere. I had a phone with speech recognition as a prominent feature until I wiped it and installed a different OS, and I don't remember seeing any warning at all about that.

[-] B0rax@feddit.org 5 points 1 week ago

What? Local speech recognition is already integrated in most phones. Open source options are also freely available… I am not sure what the news is here…

[-] Nalivai@discuss.tchncs.de 2 points 1 week ago

Maybe it's iPhone thing? Usually when I see the news like "this amazing innovative feature will be finally available and will change the world", and it's something mundane that I was able to use for years, it means that iPhone is getting this feature finally.

[-] B0rax@feddit.org 2 points 1 week ago

The iPhone also had local speech recognition for quite a while now. It is available since the iPhone 6s and the iPhone SE. Both of which came out in 2015 - 9 Years ago.

[-] brisk@aussie.zone 4 points 1 week ago

This maneuver may sound simple, but it involves an entirely new and unique code for which the researchers have sought a patent.

How to make your discovery worthless in a single, idiotic move.

[-] MisterD@lemmy.ca 3 points 1 week ago

Only to send the words back to Google? No thanks

[-] Verito@lemm.ee 1 points 1 week ago* (last edited 1 week ago)

But it saves so much money on server time and data costs to just send the final transcript!
/s

[-] t3rmit3@beehaw.org 3 points 1 week ago

Coming soon

Not to my phone it's not!

[-] hendrik@palaver.p3x.de 3 points 1 week ago* (last edited 1 week ago)

I think the real deal would be to have that available as open source. Maybe integrated directly into the core AOSP. I mean the technology is available. And my phone has like 8GB of RAM. The only issue is that all of that isn't really integrated into my phone. And I think I'd ocassionaly use speech to text, text to speech and machine translation... But I want it locally and Free Software... Same for my computer. All the software is there. But it isn't integrated into the desktop and takes half a day to set up all the different Python projects, and then I can launch some commands via the terminal. But I'd rather have something that's integrated into the rest of my tools...

[-] jjjalljs@ttrpg.network 3 points 1 week ago

I don't think I've ever desired to have speech as an interface for a device.

Yeah, I could yell at it "Open the browser and go to uhh the order of the stick comic index page" and maybe it would get it right. Or I could just... click on the browser, type oot and pick it from the drop down. Faster, no error, no expensive processing.

I don't drive (cars are a bad form of transit and I'm lucky enough to not need one) and I'm not hands-full in the kitchen often.

[-] Markaos@discuss.tchncs.de 3 points 1 week ago

Indeed, try switching your smartphone to airplane mode and see how far your voice commands get you.

Did that (or rather disabled mobile data and WiFi, because airplane mode would still keep the WiFi on), and then I dictated this sentence after the parentheses. So Google's voice input works offline just fine.

Or do they mean something like a smart assistant? In that case fair, but it's not like it will work with text input either.

It is true, however, that Google Translate doesn't do offline voice translation even if the language you're trying to translate from is downloaded for system-wide voice recognition.

this post was submitted on 13 Dec 2024

51 points (100.0% liked)

Technology

37799 readers

176 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org