I seem to remember having similar issues on Neon, back shortly after it came out. I chalked it up to it being bleeding edge'ish, went back to Kubuntu and then Debian.
you're nowhere close to RAM exhaustion. I had similar mishaps on an all-AMD system a few gens back and it manifested itself as micro-stutters that occasionally grew to such manifestations. I think I remember it was fixed via a combination of kernel switches and progressively better performance as new versions of kernel and modules/drivers progressed.
no idea what KDE Neon is based on (Ubuntu LTS?), but I'm guessing you rock pretty old kernels and relatively modern hardware, which is a pain. also you don't need a swapfile, use zram. or just switch to fedora or sumsuch that takes care of all them things for you.
This looks like either a driver issue, but more likely, a hardware issue. Either your nvme, or your RAM, is faulty. Run memcheck (it's a bootable thing you run to make sure your ram is ok), and I'm sure there are tests for ssds too.
smartctl would be what your looking for even for ssds (although ssds fail quick enough that if smartctl catches something there's a chance it's already too late, smartd allows for scheduled tests and I've definitely saved data off of ssds because I had daily smart tests running that caught early failure).
I however strongly disagree with the hardware issue. there is no indication that this is hardware (honestly hardware accounts for VERY few issues like this, and RAM failing still happens but is 98% a thing of the past). diagnosing without any logs is a bit of a lost cause, we simply don't have enough info, hopefully OP updates the post with the output of journalctl from the last boot.
Bad RAM is still a thing (even on regular PCs), there's a reason ECC memory has a market (true ECC, not the stuff that DDR5 has built-in). But I agree that it's likely just an OOM/Thrashing situation. Linux famously doesn't handle them very well, and the behavior OP is seeing is very much consistent with that.
dead ram definitely still happens, yes, but it's exceedingly rare. I fix hundreds of PCs a year, and I maybe get one or two a year where the root cause is actually bad ram. more often it's configuration issues or hardware implementation issues, for example the gigabyte x870 boards really don't like XMP for some reason.
ecc doesn't really have anything to do with whether a ram stick fails or not, it can help with misbehaving sticks but if a stick is dead it's dead and ecc can't help a dead region.
If your DE/Launcher uses systemd scopes properly you might be able to see something in the journal. As an example somewhere in my logs I can see this:
Jan 17 17:52:50 sky systemd[2171]: app-niri-steam-40213.scope: Failed with result 'oom-kill'.
Jan 17 17:52:50 sky systemd[2171]: app-niri-steam-40213.scope: Consumed 6h 32min 39.773s CPU time, 9.4G memory peak, 6.2G memory swap peak.
That's pretty clearly severe thrashing and an eventual OOM event caused by a game. If you're not familiar, the command journalctl -e -b -1 gives you the last log lines from the last boot. Use d and u to navigate the pager and q to quit. This will only work if the launcher you are using sets up transient systemd scopes and doesn't just fork-exec into the application (Fuzzel does the wrong thing by default, as do many others).
I've also seen large Steam downloads causing such issues, so capping your download speed might help. As could enabling ZRAM.
Edit: Also, this is most likely completely unrelated but do note that Neon is basically abandoned. You should very much consider switching to a maintained distribution, whether that's another Ubuntu spin or Fedora or something else entirely.
Thanks for the journalctl command, I think I was looking for hints like this. I'll be reviewing my journalctl next time I get a crash. Regarding Steam, since it's using NVME both for the OS and the gaming disk, it downloads at rather crazy speeds without slowing down the OS (as long as I'm not trying additionally something else also crazy of course...but I can continue browsing and watching videos just fine).
Also, this is most likely completely unrelated but do note that Neon is basically abandoned. You should very much consider switching to a maintained distribution, whether that’s another Ubuntu spin or Fedora or something else entirely.
Thanks! Yeah I might reconsider a whole system wipe. I've tried shortly Fedora before, and Nobara for a few years, but I think I'd prefer something Ubuntu-based with KDE. Something that it's not Kubuntu, that is. I don't want snap crap.
When this happens, can you switch console (ctrl-alt-f1) or restart X (ctrl-alt-backspace) or can you ssh from another PC? you can also in a window have a journalctl or something tailing the logs and see if something is happening there
Haven't tried remote SSH yet. But switching to text console doesn't work, unresponsive for that, too.
Ok so I kinda had a similar problem. Difference is that I was using Arch and full disk encryption. System would freeze up if I tried writing big files and disk light would start blinking. It might not be that to so maybe run "journalctl -b -1" the next boot after your system freezes and check towards the bottom of the log to see if there are any errors, usually red. Another way is to use btop running in the background and when the system gives any sign that it'll freeze switch to btop and check what's going on. Edit: something that came to me is to try to switch to another tty using Ctrl+alt+number, I'm not sure how neon works so try 2 or 3 or 4.
In addition to all the good suggestions already here, consider installing early-oom and configure it to kill the stuff you care less, maybe one of those heavy electron-based clients.
Better use systemd-oomd it comes with systemd already on arch and works pretty well
It may not be the raw RAM usage.
My first suspect is the Windows VM especially if it's running enterprise security software 4GB is probably not enough for modem Windows and it could be trying to use its page file, thrashing your disk in the process.
Are you able to collect some data from system monitor on paging and disk activity? That could help you narrow it down. You can use btop for a quick terminal option if your gui is non responsive (assuming your could switch to a console). Vmstat is another option that you can run in the background to collect stats over time, but it's not user friendly.
Nothing much enterprise...It's running "Windows App", just a glorified RDP with extra authentication settings for SSO etc. Hence why I gave it only 4GB. It's not just GUI not being responsive, everything is. It's a full freeze, and I can't get to the text consoles either. Most I can aspire to, I think, is to gather data from right before the freeze happens....and check it after I reset the computer.
I see. My concern was with security scanning tools often put on computers by enterprise IT departments but it sounds like that's not the case here.
In your situation, assuming you're not finding what you seek with journalctl, I think I would use a tool like vmstat or sar to collect periodic snapshots of CPU, memory, and io. You can tell it to collect data every X seconds and tee that to a file. After you reboot you can see what happened leading up to the crash. You should be able to import the data into a spreadsheet or something for analysis, but it's not very intuitive and you'll need to consult man pages for the options and how to interpret them.
There are a lot of good suggestions in this thread. I would lean towards a hardware or driver issue, maybe bad RAM. Unfortunately these things take a lot of trial and error to figure out.
Neon doesn’t force you to actually update the ubuntu it’s built on unless you manually do it iirc. Update your shit and report back.
Once you decide not to try that, top, btop atop or htop can tell you the amount of ram you’re using. They will all also tell you how your disk writes are doing.
It doesn’t sound like you have a ram issue, it sounds like you have a disk issue. First and foremost, once you’ve verified that you have plenty of memory available using a tool described above, expand your windows vm to 8gb. Windows would aggressively page if it had only 4gb and windows in a vm will also aggressively page when it only has 4gb, except it has to go through kvm to access those qcows.
It sounds like you have way too many tabs open. Close some and see if that helps you out. You can highlight a bunch of them by selecting one and ctrl-shift clicking on another one to get every tab in between. Right click and add to bookmarks then close them.
Next, use spinrite with I think a level 3 scan on all your nvme drives. It shaves a write cycle off the top (you have hundreds of thousands at the very least) but in return makes everything fast again. Flash memory becomes less responsive as read cycles on a block pile up until it’s rewritten.
Try different USB devices. I had a bad mouse do something similar.
Linux
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0