Any self-hosted speech-to-text / text-to-speech LLM available? (lemmy.world)

submitted 6 days ago by Zeon@lemmy.world to c/selfhosted@lemmy.world

7 comments fedilink hide all child comments

Hello, I'm looking to set up my own AI to where I can use my voice to talk to it and it will talk back to me with a generated voice. Is there free and open-source project out there where I can do this easily? It would be cool to see something like GPT4All implement something like this. I'm on Arch Linux using a 7900 XTX.

top 7 comments

sorted by: hot top controversial new old

[-] scrubbles@poptalk.scrubbles.tech 26 points 6 days ago* (last edited 6 days ago)

Generally there are not LLMs that do this, but you start building up a workflow. You speak, one service reads in the audio and translates it to text. Then you feed that into an LLM, it responds in text, and you have another service translate that into audio.

Home Assistant is the easiest way to get them all put together.

https://www.home-assistant.io/integrations/assist_pipeline

Edit agree with others below. Use the apps that are made for it.

Whisper for STT
Any hosted LLM can work, text-generation-webui or tabbyapi
I use xttsv2 for TTS

[-] GameGod@lemmy.ca 11 points 6 days ago* (last edited 5 days ago)

Whisper is the way to go for speech to text (edit: had that backwards). Whisper.cpp is decently fast too: https://github.com/ggerganov/whisper.cpp/releases/tag/v1.7.1 Get the binaries from the link that's on that page (god GitHub usability sucks)

[-] Windex007@lemmy.world 6 points 6 days ago

Whisper is fantastic and has different sized models so you can zero in to what gives you the best mix of speed/accuracy for whatever hardware you'll be running it on

[-] snekerpimp@lemmy.world 2 points 5 days ago

I thought whisper was hallucinating huge chunks of text in that medical transcription app. Is it more reliable with smaller chunks?

[-] L_Acacia@lemmy.one 8 points 6 days ago* (last edited 6 days ago)

Scrubbles's comment outlined what would likely be the best workflow. Having done something similar myself, here are my recommendations:

In my opinion, the best way to do STT with Whisper is by using Whisper Writer, I use it to write most most messages and texts.

For the LLM part, I recommend Koboldcpp. It's built on top of llama.cpp and has a simple GUI that saves you from looking for the name of each poorly documented llama.cpp launch flag (cli is still available if you prefer). Plus, it offers more sampling options.

If you want a chat frontend for the text generated by the LLM, SillyTavern is a great choice. Despite its poor naming and branding, it's the most feature-rich and extensible frontend. They even have an official extension to integrate TTS.

For the TTS backend, I recommend Alltalk_tts. It provides multiple model options (xttsv2, coqui, T5, ...) and has an okay UI if you need it. It also offers a unified API to use with the different models. If you pick SillyTavern, it can be accessed by their TTS extension. For the models, T5 will give you the best quality but is more resource-hungry. Xtts and coqui will give you decent results and are easier to run.

There are also STS models emerging, like GLM4-V, but I still haven't tried them, so I can't judge the quality.

[-] Diabolo96@lemmy.dbzer0.com 4 points 6 days ago* (last edited 6 days ago)

I haven't checked progress in TTS tech for months (probably several revolutionary evolutions have happened since then), but try Coqui xttsv2.

[-] JackGreenEarth@lemm.ee 3 points 6 days ago

The Linux app SpeechNote has a bunch of links to models of both varieties, in various languages, and supports training on a specific voice.

this post was submitted on 10 Nov 2024

44 points (89.3% liked)

Selfhosted

40198 readers

453 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz