62
you are viewing a single comment's thread
view the rest of the comments
[-] fox@hexbear.net 14 points 2 months ago

You can turn unstructured data like strings and images into vectors, which are 1D arrays: [1, 4, 8, 2, ... 8, 3.5, 9, 1] is a vector. Each number in a vector is basically a score of how close that original data is to a semantic concept. A picture of an apple and the string "apple" both score 1 on Apple and highly on Fruit and a zero or something on Mineral. Vectors can be any length really, so any one vector can define the proximity of original data to tens of thousands of concepts. An orange scores high on Fruit, but not Apple, so in a vector search, if you search fruit you'll get both apples and oranges. If you search Florida fruit, you'll get oranges but not apples. Search pie recipe, apples show up but oranges don't. And so on. Vector searches will retrieve things that are semantically related.

You fill vectors by training computers to sort the data themselves. You train the computers by exploiting tens of thousands of third world workers to manually categorize information and double check the computers until the success rates are high enough from the automated categorizers to fire the workers.

this post was submitted on 21 Nov 2025
62 points (100.0% liked)

Chapotraphouse

14245 readers
802 users here now

Banned? DM Wmill to appeal.

No anti-nautilism posts. See: Eco-fascism Primer

Slop posts go in c/slop. Don't post low-hanging fruit here.

founded 5 years ago
MODERATORS