845
lads (lemmy.world)
you are viewing a single comment's thread
view the rest of the comments
[-] squaresinger@lemmy.world 4 points 1 day ago

That's outdated info. Yes, not a lot of scraping is really necessary for training. But LLMs are currently often coupled with web search to improve results.

So for example if you ask ChatGPT to find a specific product for you, the result doesn't come from the model. Instead it does a web seach, then it loads the results, summarizes them and returns you the summary plus the links. This is a time-critical operation since the user is waiting for the results. It's also a bad operation for the site being scraped in many situations (mostly when looking for info, not for products) since the user might be satisfied with the summary and won't click the source.

So if you can delay scraping like that by a few seconds, that's quite significant.

this post was submitted on 13 Aug 2025
845 points (98.2% liked)

Programmer Humor

25699 readers
1371 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS