67
submitted 2 days ago* (last edited 2 days ago) by over_clox@lemmy.world to c/linux@lemmy.ml

I've got a whole bucket full of old hard drives, CDs and DVDs, and I'm starting the process of backing up as much as still works to a 4TB drive.

It's gonna be a long journey and lots of files, many prone to being duplicates from some of the drives.

What sorts of software do you Linux users recommend?

I'm on Linux Mint MATE, if that matters much.

Edit: One of the programs I'm accustomed to from my Windows days is FolderMatch, which is a step above simple duplicate file scanning, it scans for duplicate or semi-duplicate folders as well and breaks down individual file differences when comparing two folders.

I see I've already gotten some responses, and I thank everyone in advance. I'm on a road trip right now, I'll be checking you folks recommend software later this evening or as soon as I can anyways.

top 50 comments
sorted by: hot top controversial new old
[-] doeknius_gloek@discuss.tchncs.de 20 points 2 days ago* (last edited 2 days ago)

I've had great success with restic. It will handle your 4TB just fine, here's some stats of mine:

Total File Count: 78374
Total Size: 13.324 TiB

and another one, not as large but with lots of files

Total File Count: 1295210
Total Size: 2.717 TiB

Restic will automatically deduplicate your data so your duplicates won't waste storage at your backup location.

I've recently learned about backrest which can serve as a restic UI if you're not comfortable with the cli, but I haven't used it myself.

To clean your duplicates at the source I would look into Czkawka as another lemming already suggested.

[-] Ekpu@lemmy.world 2 points 2 days ago

I use backrest selfhostet on my server running yunohost. It is pretty much set and forget. I love it.

[-] Squizzy@lemmy.world 1 points 2 days ago

Hey, does this have a gui? I am new to linux and cant quite handle doing work like thisnwithout a gui.

[-] MonkderVierte@lemmy.ml 4 points 2 days ago* (last edited 1 day ago)

That is filesystem-level. Btrfs and i think ZFS? have deduplication built in.

Btrfs gave me 150 GB on my 2 TB gaming disk that way.

[-] Churbleyimyam@lemm.ee 12 points 2 days ago* (last edited 2 days ago)

I've had success using Czkawka (hiccup) for deduplicating

[-] huskypenguin@sh.itjust.works 2 points 2 days ago

Yea this software rules. I've analyzed 20TB with it.

[-] dessalines@lemmy.ml 6 points 2 days ago

Nightly rsync job in crontab works well enough, if its an external hard drive.

If you're going over a network, syncthing.

[-] ninekeysdown@lemmy.world 3 points 2 days ago

I’ve abused syncthing in some many ways migrating servers and giant data sets. It’s freaking amazing. Though it’s been a few years since I’ve used it. Can only guess how much better it’s gotten.

[-] MrPoopbutt@lemmy.world 1 points 2 days ago

Isnt syncthing no longer supported?

Does that even matter if it isnt?

[-] dessalines@lemmy.ml 7 points 2 days ago

Syncthing is very much alive.

[-] Ashiette@lemmy.world 2 points 2 days ago

Syncthing has been discontinued on android (but a fork exists)

[-] over_clox@lemmy.world 1 points 2 days ago

'An' drive? I mean like 10+ drives, looking to do a master backup.

[-] dessalines@lemmy.ml 2 points 2 days ago
[-] over_clox@lemmy.world 1 points 2 days ago

Please do explain then.

I have multiple drives with various differing directory trees.

[-] dessalines@lemmy.ml 1 points 2 days ago

I have no idea what your setup is so you'll need to do your own research on rsync.

[-] over_clox@lemmy.world 1 points 2 days ago

That's just it, there is no setup, except Linux Mint as the main system. It's literally a physical bucket of discs and drives in all sorts of various formats..

[-] zdhzm2pgp@lemmy.ml 9 points 2 days ago

For duplicates: Czkawka. Also, you get a gold ⭐ if you can figure out how to pronounce it 😉

[-] treasure@feddit.org 8 points 2 days ago

Take a look into borg backup.

[-] JTskulk@lemmy.world 3 points 2 days ago

fdupes to find duplicate files, freefilesync to back it up.

[-] serenissi@lemmy.world 2 points 2 days ago

Not recommending software. As you mentioned old hard disks, it is better to copy the files or better dd them on a ssd. That way making index and finding duplicates will be faster cause you've to access files once and not care about fragmentation if you dd.

[-] solrize@lemmy.world 3 points 2 days ago* (last edited 2 days ago)

I'm using Borg and it's fine at that scale. I don't know if it would still be viable with 100TB or whatever. The initial backup will be kind of slow but it encrypts everything, and deduplicates it too if I'm not mistaken. In any case, it deduplicates the common situation where you back up another snapshot later. Only the differences get written in the second backup. So you can save new snapshots fairly quickly and without much additional space.

[-] over_clox@lemmy.world 0 points 2 days ago

I don't even want this data encrypted. Quite the opposite actually.

This is mostly the category of files getting deleted from the Internet Archive every day. I want to preserve what I got before it gets erased...

[-] solrize@lemmy.world 2 points 2 days ago

You can turn off Borg encryption but maybe what you really want is an object store (S3 style). Those exist too.

[-] truthfultemporarily@feddit.org 3 points 2 days ago

So a lot of backup solutions do deduplication on the block level, so if you use a backup software that does this, you don't need to dedup files.

[-] over_clox@lemmy.world 1 points 2 days ago

I have like 10+ hard drives and probably 75+ optical discs to back up, and across the different devices and media, the folder and file structure isn't exactly consistent.

I already know in advance that I'm gonna have to curate this backup myself, it's not quite as easy to just purely let backup/sync software do it all for me.

But I do need software to help.

[-] everett@lemmy.ml 7 points 2 days ago* (last edited 2 days ago)

across the different devices and media, the folder and file structure isn't exactly consistent.

That's the thing: it doesn't need to be. If your backup software or filesystem supports block-level deduplication, all matching data only gets stored once, and filenames don't matter. The files don't even have to 100% match. You'll still see all your files when browsing, but the system is transparently making sure to only store stuff once.

Some examples of popular backup software that does this are Borgbackup and Restic, while filesystems that can do this include BTRFS and ZFS.

load more comments (10 replies)
[-] baltakatei@sopuli.xyz 3 points 2 days ago* (last edited 2 days ago)

Personally, my toolkit includes Jdupes for duplication scanning and Rsync for directory merging and file transfer.

[-] lordnikon@lemmy.world 3 points 2 days ago

I have had good luck with Dupeguru

[-] billwashere@lemmy.world 1 points 2 days ago

Honestly I maintain a list of file types I care about and copy those off. It’s mostly things I’ve created or specifically accumulated. Things like mp3, mkv, gcode, stl, jpeg, doc, txt, etc. Find all of those and copy them off. I also find any files over a certain size and copy them off unless they are things like library files, dlls, that sorta thing. Am I possible going to kiss something, yeah. But I’ll get most of the things I care about.

load more comments (4 replies)
[-] catloaf@lemm.ee 1 points 2 days ago

Do you need any of it? Usually I've not even thought about what might be on an old drive.

If I was worried about the slim chance there's something of critical importance I'd need later, I'd just look over each device and pick out individual files I might want, and dump the rest.

If you're extremely paranoid, I'd take a block-level backup of each device and archive it.

load more comments (4 replies)
[-] Cyber@feddit.uk 0 points 2 days ago

There's BeyondCompare and Meld if you want a GUI, but, if I understand this correctly, rmlint and fdupes might be helpful here

I've done similar in the past - I prefer commandline for this...

What I'd do is create a "final destination" folder on the 4TB drive and then other working folders for each hdd / cd / dvd that you're working through

Ie

/mnt/4TB/finaldestination /mnt/4TB/source1 /mnt/4TB/source2 ...

Obviously finaldestination is empty to start with so it could just be a direct copy of your first hdd - so make that the largest drive.

(I'm saying copy here, presuming you want to keep the old drives for now, just in case you accidentally delete the wrong stuff on the 4TB drive)

Maybe clean up any obvious stuff

Remove that first drive

Mount the next and copy the data to /mnt/4TB/source2

Now use rmlint or fdupes and do a dry-run between source2 and finaldestination and get a feel whether they're similar or not, so then you'll know whether to just move it all to finaldestination or maybe then use the gui tools.

You might completely empty /mnt4TB/source2, or it might still have something in, depends on how you feel it's going.

Repeat for the rest, working on smaller & smaller drives, comparing with the finaldestination first and then moving the data.

Slow? Yep. Satisfying that you know there's only 1 version there? Yep.

Then do a backup 😉

[-] over_clox@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

The way I'm organizing the main backups to start with is with folder names such as 20250505 Laptop Backup, 20250508 Media Backup, etc.

Eventually I plan on organizing things in bulk folders with simple straightforward names such as Movies, Music, Game ROMs, Virtual Machines, etc.

Yes, thankfully I already got all my main files, music and movies backed up. Right now I'm backing up my software, games, emulator ROMs, etc.

Hopefully that drive finishes backing up before the weather gets bad, cuz I'm definitely shutting things down when there's lightning around...

load more comments
view more: next ›
this post was submitted on 08 May 2025
67 points (95.9% liked)

Linux

54073 readers
901 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 6 years ago
MODERATORS