How does indexing work at huge scales? (programming.dev)

submitted 4 months ago by ertai@programming.dev to c/ask_experienced_devs@programming.dev

4 comments fedilink hide all child comments

Youtube would be a prime example: I'm guessing the storage required for the metadata of all videos is too large to be stored on a single server, so how do they achieve millisecond-level performance on searches and handle millions of queries routinely?

What kind of infrastructure and technology is required for this?

Do you have any resources I could use to learn more on this subject?

top 4 comments

sorted by: hot top controversial new old

[-] HelloRoot@lemy.lol 15 points 4 months ago* (last edited 4 months ago)

At massive scale, indexing is done by distributing the data rather than relying on a single machine. The index is split into shards, each holding a subset of the data, commonly partitioned by hashing IDs or dividing term ranges. Every shard is replicated to multiple machines so reads can be load-balanced and failures do not take the system down.

Search queries are handled by a coordinator that sends the query to the relevant shards in parallel, collects their partial results, merges and ranks them, and returns the final result. Because all shards work at the same time, query latency depends on the slowest shard, not on total index size.

This setup is built on search engines based on inverted indexes, usually derived from Lucene, either via systems like Elasticsearch or via custom implementations. Metadata and related data are stored in distributed databases or key-value stores, while index updates are streamed asynchronously so writes do not block reads. Caching at multiple layers keeps frequently accessed data in memory, and the whole system runs on large clusters that automatically handle placement, scaling, and failures.

idk where you are, but where I live anybody can go to the university lectures for free, as long as they are not full. Or the library and browse the relevant section. Personally I learned everything IT related from uni courses and searching for my topics of interest in the uni lib. So thats my shitty recomendation, I'm sure there are online resources and courses on it though.

[-] breadsmasher@lemmy.world 4 points 4 months ago

Some super high level references around eventual consistency, database sharding, edge computing

https://en.wikipedia.org/wiki/Eventual_consistency?wprov=sfti1

https://en.wikipedia.org/wiki/Shard_(database_architecture)?wprov=sfti1

https://en.wikipedia.org/wiki/Edge_computing?wprov=sfti1

[-] Shadow@lemmy.ca 3 points 4 months ago

Often it's distributed databases like Google spanner, amazon aurora, cockroach db, etc. Google has a public whitepaper on spanner you can read.

https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf

[-] Corbin@programming.dev 1 points 4 months ago

If you want to know how Google specifically does things, search for "TeraGoogle"; it's not a secret name although I don't think it has a whitepaper. The core insight is that there are tiers of search results. When you search for something popular that many other people are searching for, your search is handled by a pop-culture tier which is optimized for responding to those popular topics. The first and second pages of Google results are served by different tiers; on Youtube, the first few results are served from a personalized tier which (I expect has) cached your login and knows what you like, and the rest of the results are from a generalist tier. This all works because searches, video views, etc. are Pareto-allocated; most of the searches are for a tiny amount of cacheable content.

There's also a UX component. Suppose that you dial Alice's server and Alice responds with a Web app that also fetches resources from Bob's server. This can only be faster for you in the case where Bob is so close to you (and so responsive) that you can dial Bob and get a reply faster than Alice finishes sending her app. But Alice and Bob are usually colocated in a datacenter, so Alice will always be closer to Bob than you. This suggests that if Alice wants to incorporate content from Bob then Alice might as well dial Bob herself and not tell you about Bob at all. This is where microservices shine. When you send a search to Google, Youtube, Amazon, or other big front pages, you're receiving a composite result which has queries from many different services mixed in. For the specific case of Google, when you connect to google.com, you're connecting to a machine running GWS, and GWS connects to multiple search backends on your behalf.

Finally, how typical of a person are you? You might not realize how often your queries are handled by pop-culture tiers. I personally have frequent experiences where my search turns up zero documents on DDG or Google, where there are no matching videos on Youtube, etc. and those searches take multiple seconds to come up empty. If you're a weird person who constantly finds googlewhacks then you're not going to perceive these services as optimized for you, because they cannot optimize for the weird.

this post was submitted on 04 Feb 2026

16 points (100.0% liked)

Ask Experienced Devs

1491 readers

1 users here now

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 3 years ago

MODERATORS

Ategon@programming.dev