22
submitted 4 days ago* (last edited 4 days ago) by HiddenLayer555@lemmy.ml to c/asklemmy@lemmy.ml

I have a lot of tar and disk image backups, as well as raw photos, that I want to squeeze onto a hard drive for long term offline archival, but I want to make the most of the drive's capacity so I want to compress them at the highest ratio supported by standard tools. I've zeroed out the free space in my disk images so I can save the entire image while only having it take up as much space as there are actual files on them, and raw images in my experience can have their size reduced by a third or even half with max compression (and I would assume it's lossless since file level compression can regenerate the original file in its entirety?)

I've heard horror stories of compressed files being made completely unextractable by a single corrupted bit but I don't know how much a risk that still is in 2025, though since I plan to leave the hard drive unplugged for long periods, I want the best chance of recovery if something does go wrong.

I also want the files to be extractable with just the Linux/Unix standard binutils since this is my disaster recovery plan and I want to be able to work with it through a Linux live image without installing any extra packages when my server dies, hence I'm only looking at gz, xz, or bz2.

So out of the three, which is generally considered more stable and corruption resistant when the compression ratio is turned all the way up? Do any of them have the ability to recover from a bit flip or at the very least detect with certainty whether the data is corrupted or not when extracting? Additionally, should I be generating separate checksum files for the original data or do the compressed formats include checksumming themselves?

top 3 comments
sorted by: hot top controversial new old
[-] Simplicity@lemmy.world 3 points 4 days ago* (last edited 4 days ago)

Dont know much about *nix compression but keen to hear about other's opinions.

I would suggest looking at par files for your corruption concerns. They would add overhead both in time and space but well worth it. I checksum everything I back up so that I can verify it. And par larger files so a bit flip, or more realistically a bad sector or two (thousand) doesn't break everything.

[-] F04118F@feddit.nl 1 points 4 days ago

It's copy-pasted, not linked, but this is essentially a crosspost of: https://lemmy.ml/post/36614892

There are some good answers there already

[-] Simplicity@lemmy.world 2 points 4 days ago

Thanks! Good to see par2 also mentioned there.

this post was submitted on 24 Sep 2025
22 points (100.0% liked)

Asklemmy

50662 readers
918 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy ๐Ÿ”

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 6 years ago
MODERATORS