652
This is Hostile to Business
(lemmy.blahaj.zone)
A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.
Created as an evolution of White People Twitter and other tweet-capture subreddits.
Rules:
Related communities:
I'd like to play devil's advocate for a sec and ask this question, how is a company scraping information from publicly available sources to train AI models any different than companies scraping that same publicly available data and indexing it for search?
While the search model is helpful to is all, Google isn't doing it out of the kindness of their hearts, they have a whole business model based on selling advertising utilizing the information they have freely indexed. Yet very few complain about search indexers crawling their data like they do AI bots.
Again, just playing devil's advocate for the sake of curiosity.
You answered your own question. The search engine indexes your page to send traffic to you. The AI bot indexes your page to plagiarize your content.
Anecdotally, AI also routinely ignores sites' robots.txt and spoof their agents to try to hide what they're doing. A lot of site owners are complaining about the costs of delivering content to web scrapers. Where search indexes might hit a site every day, some AI bots are running every hour and just wasting their bandwidth.