How does indexing work at huge scales? (programming.dev)

submitted 4 days ago by ertai@programming.dev to c/ask_experienced_devs@programming.dev

4 comments fedilink hide all child comments

Youtube would be a prime example: I'm guessing the storage required for the metadata of all videos is too large to be stored on a single server, so how do they achieve millisecond-level performance on searches and handle millions of queries routinely?

What kind of infrastructure and technology is required for this?

Do you have any resources I could use to learn more on this subject?

you are viewing a single comment's thread
view the rest of the comments

[-] HelloRoot@lemy.lol 15 points 4 days ago* (last edited 4 days ago)

At massive scale, indexing is done by distributing the data rather than relying on a single machine. The index is split into shards, each holding a subset of the data, commonly partitioned by hashing IDs or dividing term ranges. Every shard is replicated to multiple machines so reads can be load-balanced and failures do not take the system down.

Search queries are handled by a coordinator that sends the query to the relevant shards in parallel, collects their partial results, merges and ranks them, and returns the final result. Because all shards work at the same time, query latency depends on the slowest shard, not on total index size.

This setup is built on search engines based on inverted indexes, usually derived from Lucene, either via systems like Elasticsearch or via custom implementations. Metadata and related data are stored in distributed databases or key-value stores, while index updates are streamed asynchronously so writes do not block reads. Caching at multiple layers keeps frequently accessed data in memory, and the whole system runs on large clusters that automatically handle placement, scaling, and failures.

idk where you are, but where I live anybody can go to the university lectures for free, as long as they are not full. Or the library and browse the relevant section. Personally I learned everything IT related from uni courses and searching for my topics of interest in the uni lib. So thats my shitty recomendation, I'm sure there are online resources and courses on it though.

this post was submitted on 04 Feb 2026

15 points (100.0% liked)

Ask Experienced Devs

1452 readers

9 users here now

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

Ategon@programming.dev