15
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 24 Aug 2025
15 points (100.0% liked)
Asklemmy
51698 readers
569 users here now
A loosely moderated place to ask open-ended questions
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 6 years ago
MODERATORS
Unfortunately no, audio files are actually really dumb in that they’re basically just a file of 44100 (or 48000 or 96000 etc) amplitude numbers per second.
So there’s nothing really to diff because it’s basically just a squiggly line, set of squiggly lines or, when compressed, a mathematical expression that when decompressed, recreates a squiggly line.
You could isolate the dialog if you got ahold of a version with no dialog at all and then inverse the polarity of that and sum it with the original but it’s unlikely you’ll find a version without any vocals.
Machine learning vocal isolation tools are probably going to be the best way to go about it as a DIY approach. Ultimate Vocal Remover 5 with the demucs 4 algo is great FOSS software to extract vocals and you could sum that with the original track and adjust the gain to get louder dialogue… it would be a lot of work though…
I don't really understand still but thanks for trying all the same.