825

The Internet Archive is under attack, with a popup claiming a ‘catastrophic’ breach (www.theverge.com)

submitted 2 weeks ago by misk@sopuli.xyz to c/technology@lemmy.world

86 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Thekingoflorda@lemmy.world 210 points 2 weeks ago

I can’t think of any reason to attack that website, what have they done wrong?

[+] 7fb2adfb45bafcc01c80@lemmy.world -71 points 2 weeks ago

I just sent a DMCA takedown last week to remove my site. They've claimed to follow meta tags and robots.txt since 1998, but no, they had over 1,000,000 of my pages going back that far. They even had the robots.txt configured for them archived from 1998.

I'm tired of people linking to archived versions of things that I worked hard to create. Sites like Wikipedia were archiving urls and then linking to the archive, effectively removing branding and blocking user engagement.

Not to mention that I'm losing advertising revenue if someone views the site in an archive. I have fewer problems with archiving if the original site is gone, but to mirror and republish active content with no supported way to prevent it short of legal action is ridiculous. Not to mention that I lose control over what's done with that content -- are they going to let Google train AI on it with their new partnership?

I'm not a fan. They could easily allow people to block archiving, but they choose not to. They offer a way to circumvent artist or owner control, and I'm surprised that they still exist.

So... That's what I think is wrong with them.

From a security perspective it's terrible that they were breached. But it is kind of ironic -- maybe they can think of it as an archive of their passwords or something.

[-] CooperRedArmyDog@lemmy.ml 17 points 2 weeks ago

how do you expect an archive to happen if they are not allowed to archive while it is still up. How are you suposed to track changed or see how the world has shifted. This is a very narrow and in my opinion selfish way to view the world

[-] 7fb2adfb45bafcc01c80@lemmy.world -2 points 2 weeks ago

how do you expect an archive to happen if they are not allowed to archive while it is still up.

I don't want them publishing their archive while it's up. If they archive but don't republish while the site exists then there's less damage.

I support the concept of archiving and screenshotting. I have my own linkwarden server set up and I use it all the time.

But I don't republish anything that I archive because that dilutes the value of the original creator.

[-] KyuubiNoKitsune@lemmy.blahaj.zone 8 points 2 weeks ago

What if I'm looking for something but the page has changed?

[-] 7fb2adfb45bafcc01c80@lemmy.world -4 points 2 weeks ago

Shouldn't that be the content creator's prerogative? What if the content had a significant error? What if they removed the page because of a request from someone living in the EU requested it under their laws? What if the page was edited because someone accidentally made their address and phone number public in a forum post?

[-] Landsharkgun@midwest.social 3 points 2 weeks ago

Nah. It just lets slimy gits claim they never said XYZ, or that such and such a thing never happened. With as volatile a storage media as internet media, hard backups are absolutely necessary. Put it this way; would you have the same complaimt about a newspaper? A TV show? Post your opinion piece to a newspaper and it's fixed in ink forever. Yet somehow you complain when that same opinion piece is on a website? Get outta here.

[-] 7fb2adfb45bafcc01c80@lemmy.world 0 points 2 weeks ago

Like I said, I have no problems with individuals archiving it and not republishing it.

If I take a newspaper article and republish it on my site I guarantee you I will get a takedown notice. That will be especially true if I start linking to my copy as the canonical source from places like Wikipedia.

It's a fine line. Is archive.org a library (wasn't there a court case about this recently...) or are they republishing?

Either way, it doesn't matter for me any more. The pages are gone from the archive, and they won't archive any more.

[-] zarkanian@sh.itjust.works 0 points 2 weeks ago

A couple of good examples are lifehacker.com and lifehack.org. Both sites used to have excellent content. The sites are still up and running, but the first one has turned into a collection of listicles and the second is an ad for an "AI-powered life coach". All of that old content is gone and is only accessible through the Internet Archive.

In fact, many domains never shut down, they just change owners or change direction.

[-] 7fb2adfb45bafcc01c80@lemmy.world 0 points 2 weeks ago* (last edited 2 weeks ago)

Again, isn't that the site's prerogative?

I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put

User-agent: ia_archiver
Disallow:

in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.

If you want to be a library, be open and honest about it. There's no need to sneak around.

load more comments (31 replies)

load more comments (45 replies)

this post was submitted on 09 Oct 2024

825 points (99.9% liked)

Technology

58965 readers

3695 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS