15
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 24 Aug 2025
15 points (100.0% liked)
Asklemmy
51698 readers
516 users here now
A loosely moderated place to ask open-ended questions
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 6 years ago
MODERATORS
I have a semi-related question if you don't mind. People often complain about the voice tracks in movies being hard to hear, especially if you don't have a speaker for the center channel (but even then I have trouble)
Why haven't they solved this problem by packaging the voice track separately on the bluray/stream so you can turn up the volume of the voices only without blowing your ears out when the music hits?
I don't know why they don't, I work in music rather than TV/Film but it infuriated me too! Give me a voice volume control! It would be technically very easy to do implement as a standard but the powers that be just haven't come together and done it!
I'm glad to hear I'm not the only one thinking it!
Do you think it could be done by diffing a few of the different language tracks?
Unfortunately no, audio files are actually really dumb in that they’re basically just a file of 44100 (or 48000 or 96000 etc) amplitude numbers per second.
So there’s nothing really to diff because it’s basically just a squiggly line, set of squiggly lines or, when compressed, a mathematical expression that when decompressed, recreates a squiggly line.
You could isolate the dialog if you got ahold of a version with no dialog at all and then inverse the polarity of that and sum it with the original but it’s unlikely you’ll find a version without any vocals.
Machine learning vocal isolation tools are probably going to be the best way to go about it as a DIY approach. Ultimate Vocal Remover 5 with the demucs 4 algo is great FOSS software to extract vocals and you could sum that with the original track and adjust the gain to get louder dialogue… it would be a lot of work though…
I don't really understand still but thanks for trying all the same.