In many cases, there's no file system can withstand from power outages.
While the risk of data loss can be mitigated by disabling or limiting amount of write cache. Still there's no guarantee that will prevent corruption of the file system.
In many cases, there's no file system can withstand from power outages.
While the risk of data loss can be mitigated by disabling or limiting amount of write cache. Still there's no guarantee that will prevent corruption of the file system.
Adding on to the earlier comments you can run tests against the drive with smarctl e.g.
smartctl --scan (scans and lists all drives connected to your system)
smartctl -t short /YOUR/DRIVE (short non-destructive drive test, usually takes 3 minutes or so, afterwards run smartctl -a again to view test results towards the bottom of the output)
smartctl -t long /YOUR/DRIVE (this is a long more thorough test, non-destructive, this can easily take a few hours or even days depending on the size of your drive and whatnot)
smartctl -a or smartctl -x will give you testing progress in its output.
Or another option, if you can't run smartctl for some reason, and you're just testing a regular HDD not SSD, look into running a badblocks non-destructive scan.
can't help you with your conundrum, but I'd like to point out there are so many potential issues in the chain regarding a drive in a USB enclosure that you're unwise to treat it as an always-on connected device. any number of those things can go wrong and net you what you had, and worse.
Install smartmontools package for your distro, if it's not already installed. Then
Check your disk's name with sudo lsblk. After that, replace your_disk with your disk's name (sda, nvme0n1 etc.) in the command below.
sudo smartctl -x /dev/your_disk
If the results say PASSED, you're probably good. You can also pass the output to an LLM by the way. At least they are good at these kind of things.
I would strongly recommend not using a LLM as they are error prone
I agree though I use them as TLDR in this context and they're generally fine with that as far as I can tell. Otherwise it's a long output to check but I usually check the entire output anyway.
Which filesystem are you using? Some filesystems are better at handling power failure than others, and some write a backup partition table to the disk in other locations. It could be that you were just really really unlucky, but the partition table just needs to be restored. Hard to say for sure without examining your filesystem.
I thought it was ext4, but it seems to be ext3. A standard file system check didn't find any errors after I restored the partition table.
You can use a tool like smart control to see some drive info but unfortunately Linux has very few decent drive diagnostic tools. All the good tools need DOS or Windows.
smartmontools is good.
It's good, the only downside is it's a smart tool which only gives you smart data Vs actual drive health
While it's true we don't really need those old tools anymore, unless one have ancient hardware. On Linux we can use badblocks to test the hard drive. This is from Arch Wiki:
Modern HDDs and SSDs include firmware that will automatically detect, attempt to correct, and report errors. However, firmware becomes aware of a corrupted sector only upon an attempt to read or write to it. Badblocks may be used to test the entire device at once. It operates by sequentially attempting to read and optionally write to and read back every sector on a drive, and report errors. Consequently, the firmware will react to any detected failures in this process.
So, for most cases SMART data is actually sufficient. And there is badblocks if you want to check the entire disk. However we don't have manufacturer tools like Windows has.
A little warning about badblocks. Don't do a write test if you have important stuff on it because it will erase the disk.
As a computer technician I have seen hundreds of times the drive and smart data didn't actually know what was going on and it took quality tools to alert the driver to what was happening to start work.
If the drive's firmware is faulty, SMART data will be faulty too. But can you say the percentage is somewhat high from what you dealt with, a little statistics? What I saw is my personal experience and it's definitely wouldn't be accurate as yours. I only saw a drive died out of nowhere a handful of times which is not high if I make it into a percentage.Though if the drive itself is faulty, it won't take long for it to die too.
The best I saw is a WD Caviar Black 500 GB drive from 2011 we use, still kicking. Took a backup because of its age a couple years ago but haven't died yet. The worst I saw was my friend's NVMe SSD that died in 3 months after he installed. Probably its firmware was also faulty because SMART didn't help that time.
It's nothing to do with faulty firmware, it's that smart will only see 1 in 3 issues and as such is simply not good enough to use as actual diagnostics.
I see. So, you're saying that occasionally checking smartctl (or having smartd as a daemon continuously), running badblocks time to time and maybe checking iostat not really enough? I mean, Linux is by far the most used OS on servers and datacenters, if these are not enough someone would write a proper tool I guess, don't you think?
Not at all. It takes a huge amount of work to do so, and the benefit of using raid etc is redundancy so they can afford for things to fail. Smart mon tools is a great example, the software is great but it needs it's database to support that drives functions to work well and they can't and don't support everything.
Linux questions Rules (in addition of the Lemmy.zip rules)
Tips for giving and receiving help
Any rule violations will result in disciplinary actions