62
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 21 Nov 2025
62 points (100.0% liked)
Chapotraphouse
14210 readers
1002 users here now
Banned? DM Wmill to appeal.
No anti-nautilism posts. See: Eco-fascism Primer
Slop posts go in c/slop. Don't post low-hanging fruit here.
founded 5 years ago
MODERATORS
What is vector search - eli5? I seriously have no idea.
---
Edit
I'd still love a Hexbear's eli5.
I googled and - no surprise - the results were shit. I found a super-positive Reddit post in r/LearnMachineLearning that makes me think that vector search is a perfect match for hellworld. Also - "LearnMachineLearning" sounds like some Black Mirror shit.
I don't know if anybody cares - but there are coding examples.
You can turn unstructured data like strings and images into vectors, which are 1D arrays: [1, 4, 8, 2, ... 8, 3.5, 9, 1] is a vector. Each number in a vector is basically a score of how close that original data is to a semantic concept. A picture of an apple and the string "apple" both score 1 on Apple and highly on Fruit and a zero or something on Mineral. Vectors can be any length really, so any one vector can define the proximity of original data to tens of thousands of concepts. An orange scores high on Fruit, but not Apple, so in a vector search, if you search fruit you'll get both apples and oranges. If you search Florida fruit, you'll get oranges but not apples. Search pie recipe, apples show up but oranges don't. And so on. Vector searches will retrieve things that are semantically related.
You fill vectors by training computers to sort the data themselves. You train the computers by exploiting tens of thousands of third world workers to manually categorize information and double check the computers until the success rates are high enough from the automated categorizers to fire the workers.
A very simple vector example:
Assume we have an array of 26 binary variables, one for each letter of the alphabet. We flip to 1 if the letter is present in the word. Dog -> [0,0,0,1,0…] Cat -> [1,0,1,0,0…]
Then we can do a search by taking a target word and doing a cosine similarly search, roughly we find the binary array most similar to a given array to compute a score.
Modern models compute a much more complicated vector by using the context of the words around them. Multimodal image models use a combination of images and text to train a model so later you can use pass it some text to get a vector.
There are some things out there about doing math on vectors because the form a latent space, for example: ‘king - man + woman = queen’ but empirical tests show that this doesn’t quite hold up on modern models. 3blue1brown has a video on vector math which is worth a watch.