589
submitted 2 months ago* (last edited 2 months ago) by Tea@programming.dev to c/technology@lemmy.world
(page 2) 8 comments
sorted by: hot top controversial new old
[-] Mubelotix@jlai.lu 1 points 2 months ago

Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?

[-] mke@programming.dev 1 points 2 months ago* (last edited 2 months ago)

Apparently the dump doesn't include media, though there's ongoing discussion within wikimedia about changing that. It also seems likely to me that AI scrapers don't care about externalizing costs onto others if it might mean a competitive advantage (e.g. most recent data, not having to spend time and resources developing dedicated ingestion systems for specific sites).

I want to stress this: it's not that "tech bros" are just stupid—even though a lot of them are revoltingly unappreciative of the giants whose sholders they stand on—it's that they don't care.

load more comments (1 replies)
[-] prototype_g2@lemmy.ml 1 points 2 months ago* (last edited 2 months ago)

Feel like this belongs in !fuck_ai@lemmy.world

Think I should cross-post?

load more comments (2 replies)
[-] fubo@lemmy.world -3 points 2 months ago

To be clear, network costs represent a tiny fraction of WMF's expenses. Much of WMF's budget goes to social programs, not technical upkeep.

load more comments
view more: ‹ prev next ›
this post was submitted on 02 Apr 2025
589 points (99.3% liked)

Technology

71399 readers
2486 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS