Voice Conversion With Just Nearest Neighbors (arxiv.org)

submitted 2 years ago* (last edited 2 years ago) by rf5@kbin.social to c/machinelearning@kbin.social

0 comments fedilink hide all child comments

TL;DR: want to convert your voice to another person's voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new method a try: https://bshall.github.io/knn-vc

Longer version: our research team kept seeing new voice conversion methods getting more complex and becoming harder to reproduce. So, we tried to see if we could make a top-tier voice conversion model that was extremely simple. So, we made kNN-VC, where our entire conversion model is just k-nearest neighbors on WavLM features. And, it turns out, this does as well if not better than very complex any-to-any voice conversion methods. What's more, since k-nearest neighbors has no parameters, we can use anything as the reference, even clips of dogs barking, music, or references from other languages.

I hope you enjoy our research! We provide a quick-start notebook, code, and audio samples, and vocoder checkpoints https://bshall.github.io/knn-vc/

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here

this post was submitted on 29 Jun 2023

7 points (100.0% liked)

Machine Learning

2 readers

1 users here now

Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

founded 2 years ago