65
submitted 8 months ago* (last edited 8 months ago) by sylverstream@lemmy.nz to c/selfhosted@lemmy.world

Edit2: Thanks all for your responses! I have checked the logs, https://lemmy.nz/comment/6192604, and based on that removed tracker-miner-fs as it's a search/index tool which I don't need. No idea why it took over all memory. I'll also get a WiFi Smartplug as a kill switch. Hopefully that solves it. Thanks again heaps!


I've got a HP ProDesk G3 which I'm using as home server, I've installed Ubuntu on it. Earlier this week the services I host on it stopped (Immich & Frigate). I tried to SSH, but it just hung after asking for a password. I could ping it, but it was just unresponsive.

I had to force reboot it manually. This is fine, but I'm not always at home.

The chip has Intel vPro as far as I know, which could be an option, but I have no idea how this works. The documentation on the Intel site seems focused on enterprises. I tried to connect with RealVNC which does not work, so I think I've got to install/configure something on the server first.

I also asked Bing Chat but it came up with non existing packages & commands. Welcome your thoughts!

/edit: I just found this, which seems to be exactly what I need: https://manpages.ubuntu.com/manpages/focal/en/man7/amt-howto.7.html

all 50 comments
sorted by: hot top controversial new old
[-] PoliticallyIncorrect@lemmy.world 30 points 8 months ago* (last edited 8 months ago)

What about if you use a smart home power socket?

[-] sylverstream@lemmy.nz 11 points 8 months ago

Yes, very good idea. I've got HA on a RPI so that should be easy.

[-] PoliticallyIncorrect@lemmy.world 5 points 8 months ago

Good luck mate ✌️✌️

[-] sylverstream@lemmy.nz 4 points 8 months ago

Thanks, it's so awesome to see so many useful replies here! If you are interested, I found some very weird things in the logs :( https://lemmy.nz/comment/6192604

[-] pete@lemmy.world 21 points 8 months ago

Check if your motherboard has a watchdog function. If the OS can't ping the watchdog every 5 min or whatever you set it to, the board resets.

[-] astraeus@programming.dev 7 points 8 months ago

This is how we handled camera servers at one of my former jobs, we just setup HP SFF desktops with Windows and the software and turned on the watchdog timer, always did the trick when power outages or system hangups happened.

[-] jkrtn@lemmy.ml 18 points 8 months ago

There's a tale from long ago where someone set up a CD drive tray so that opening it would tap the reset button on a server.

[-] BCsven@lemmy.ca 6 points 8 months ago
[-] Revan343@lemmy.ca 3 points 8 months ago
[-] BCsven@lemmy.ca 3 points 8 months ago

Awesome, thanks for the link

[-] BCsven@lemmy.ca 2 points 8 months ago

Awesome, thanks for the link

[-] sylverstream@lemmy.nz 2 points 8 months ago

Thanks, I've got a HP SFF as well. Not 100% sure how to turn it on though from Ubuntu. There's a software based version: https://manpages.ubuntu.com/manpages/xenial/en/man8/watchdog.8.html

But I guess that's not the one using the motherboard watchdog function.

[-] pete@lemmy.world 1 points 8 months ago* (last edited 8 months ago)

You need an OS app to run and a setting in the BIOS. The app at the OS level gives a heartbeat to the watchdog module on the mother board. If you miss some heartbeats, the firmware on the motherboard sends the reset command.

[-] astraeus@programming.dev 1 points 8 months ago

You can set it in the BIOS, regardless of OS.

[-] sylverstream@lemmy.nz 4 points 8 months ago

Thanks! That should work.

[-] PlutoniumAcid@lemmy.world 4 points 8 months ago

This is how you lose data. Hope you have a good backup on a NAS?

[-] pete@lemmy.world 3 points 8 months ago* (last edited 8 months ago)

No, this is a tool that can be used in a well designed architecture. Would I do this with a single database server, probably not. Would I ever run a single database server? Also probably not.

Also, by this point, you've probably already kernel panicked or something. There's not much left that can be saved and you probably needed that backup five minutes before the host came up.

[-] Blaster_M@lemmy.world 10 points 8 months ago* (last edited 8 months ago)

A unifi power strip on a unifi network so you can control the power switch, and setting the motherboard to auto turn on after power failure. Though this is the nuclear option for restarting the system. Maybe while you're at it, diagnose why it keeps hanging up on you.

[-] sylverstream@lemmy.nz 3 points 8 months ago

Yeah think I'll get a standalone WiFi smart plug, not connected to my Home Assistant, as a kill switch. But you're right, it's overkill.

I found some weird things in the logs, this goes beyond my knowledge :( See https://lemmy.nz/comment/6192604

[-] JASN_DE@lemmy.world 3 points 8 months ago

But you're right, it's overkill.

I wouldn't say that. Sure, it's not the preferred way of restarting a system, but it is a good backup to have if nothing else works. Remotely messing up the network connections for example.

[-] pelya@lemmy.world 8 points 8 months ago

edit: I just found this, which seems to be exactly what I need: https://manpages.ubuntu.com/manpages/focal/en/man7/amt-howto.7.html

Ah yes, Intel's famous security hole.

Some people stopped buying Intel CPUs after this feature was introduced.

[-] Pot@kbin.social 2 points 8 months ago* (last edited 8 months ago)

Is AMD safer? or are these people buying something else?

[-] pelya@lemmy.world 4 points 8 months ago

Yeah, it's called AMD DASH, but it's available only on select CPUs, unlike Intel's variant.

https://www.amd.com/en/technologies/security-manageability

ARM I guess, or increasingly RISC-V

[-] umbrella@lemmy.ml 1 points 8 months ago

well kind of if you count pikvm

[-] nightrunner@lemmy.world 7 points 8 months ago* (last edited 8 months ago)

Ok, I grabbed a few screen shots for you as well. Here is a site that will link you to MEBx setup that enables AMT: http://h10032.www1.hp.com/ctg/Manual/c03883429

When power on your ProDesk G3, you can access the MEBx setup by pressing Ctrl+P or they also say F6 or Escape will get you there. Intel AMT runs on a different IP address than what your OS gets. You can assign DHCP or a static IP address and setup your admin password. You can then access the portal from http://ipaddress:16992 There should be a method of access what would show on the screen through a KVM like access but I use MeshCentral for that so I couldn't tell you how to do it without.

Hopefully, that gives you a start. Feel free to reach back out if you have any questions. Thank you!

[-] sylverstream@lemmy.nz 2 points 8 months ago

Thanks heaps, that's is very useful. Will connect monitor and keyboard and have a look.

[-] nightrunner@lemmy.world 2 points 8 months ago

Glad I could help! 😃

[-] mhzawadi@lemmy.horwood.cloud 5 points 8 months ago

Maybe investigate why it hung?

That could be a sign of something bigger about to kill it altogether

[-] sylverstream@lemmy.nz 5 points 8 months ago

Yes, thanks for that. Good point. I checked the logs, and minutes before it crashed I can see below in the logs. Seems like either a GPU error or out of memory error. No idea what tracker-miner-f is by the way. It also shows a massive list of processes with their memory usage.

This goes beyond my knowledge :(

Feb 21 17:27:49 hppd600-g3 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:0:00000000
Feb 21 17:32:43 hppd600-g3 kernel: 1305621 total pagecache pages
Feb 21 17:32:43 hppd600-g3 kernel: 16258 pages in swap cache
Feb 21 17:32:43 hppd600-g3 kernel: Free swap  = 0kB
Feb 21 17:32:43 hppd600-g3 kernel: Total swap = 1000444kB
Feb 21 17:32:43 hppd600-g3 kernel: 2065206 pages RAM
Feb 21 17:32:43 hppd600-g3 kernel: 0 pages HighMem/MovableOnly
Feb 21 17:32:43 hppd600-g3 kernel: 64196 pages reserved
Feb 21 17:32:43 hppd600-g3 kernel: 0 pages hwpoisoned

Feb 21 17:32:43 hppd600-g3 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-113.slice/user@113.service/background.slice/tracker-miner-fs-3.service,task=t>
Feb 21 17:32:43 hppd600-g3 kernel: Out of memory: Killed process 833 (tracker-miner-f) total-vm:625676kB, anon-rss:3144kB, file-rss:4816kB, shmem-rss:4kB, UID:113 pgtables:280kB oom_score_adj:200
Feb 21 17:32:43 hppd600-g3 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0

[-] ryannathans@aussie.zone 6 points 8 months ago

Tracker miner fs generates thumbnails for files iirc. There was a recent vulnerability where malicious files could crash it and execute code just by being on disk. Make sure you haven't been hit by malware

[-] sylverstream@lemmy.nz 4 points 8 months ago

I've uninstalled it, it's an index/search tool. Don't need it :D

[-] ryannathans@aussie.zone 3 points 8 months ago

Usually comes with your DE, sometimes removing it breaks your DE

[-] sylverstream@lemmy.nz 1 points 8 months ago

Yeah tracker miner sounds dodgy. I've only installed Immich & Frigate on the box, and no dodgy repositories. It's also auto updating. Will do research how to check for malware, thought that was a Windows only thing :D

[-] constantokra@lemmy.one 1 points 8 months ago

I've previously had a problem with my server becoming unresponsive when running immich. It's been a while, but I remember there being some kind of memory leak having to do with immich. It was in their GitHub issues and everything. On my system it would take about a day and a half and then ssh, along with everything else, would become unresponsive. Rebooting would fix it for a day and a half. I stopped running immich and it hasn't happened since. I suppose you could try using a cron job to restart immich periodically and see if that resolves your problem.

[-] sylverstream@lemmy.nz 2 points 8 months ago

That is good to know! Will keep an eye on memory usage of immich. I really like it, so I'm reluctant to let it go.

[-] cmnybo@discuss.tchncs.de 4 points 8 months ago

You could connect an ESP32 to the power and reset switches through opto-isolators or relays. You will have to do a little bit of programming, but you can host a website on the ESP32 that will allow you to operate the switches remotely.

If you want to get a bit fancier, you could connect the UART on the ESP32 to a serial port on the server through a TTL to RS-232 level converter and have a remote serial terminal embedded in the web page too. That won't do much good if the server is completely locked up though.

[-] Shadow@lemmy.ca 3 points 8 months ago

If it hung like that, you probably have some sort of storage issue or high memory consumption pushing the box into swap.

Intel amt may help you, if you want hardware then google pikvm. Raritan also makes a small single node ip kvm, but it'll probably cost more.

[-] sylverstream@lemmy.nz 2 points 8 months ago

Thanks! Yeah it seemed to be an OOM issue, but based on my Kagi qualities it seems like an OS issue. But, it also has an error about the GPU. Normal memory usage is more than fine, so perhaps it was a one time thing. See logs: https://lemmy.nz/comment/6192604

[-] Decronym@lemmy.decronym.xyz 3 points 8 months ago* (last edited 8 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
HA Home Assistant automation software
~ High Availability
IP Internet Protocol
NAS Network-Attached Storage
RPi Raspberry Pi brand of SBC
SBC Single-Board Computer
SSH Secure Shell for remote terminal access
VNC Virtual Network Computing for remote desktop access

5 acronyms in this thread; the most compressed thread commented on today has 5 acronyms.

[Thread #533 for this sub, first seen 22nd Feb 2024, 04:35] [FAQ] [Full list] [Contact] [Source code]

[-] pelya@lemmy.world 1 points 8 months ago
[-] lordnikon@lemmy.world 3 points 8 months ago

remote kvm if you are relying on a box that no longer has a network connection you are SOL and need something that can power cycle the box.

[-] solrize@lemmy.world 3 points 8 months ago* (last edited 8 months ago)

On actual server motherboards (as opposed to repurposed home PC's) there is sometimes a special KVM like interface (keyboard/video/mouse, not the VM hypervisor) so you can connect to it with VNC and have the equivalent of local access. This is called IDRAC on Dell servers and other vendors have something similar.

On a home PC, hmm, you might be able to set up some kind of remote power cycle and serial console connection, using a second computer (Raspberry Pi or the like). I'm unfamiliar with Intel AMT that you linked to, but it seems like another idea.

I do remember hearing of a DRAC-like board for PC's but the name of it escapes me right now.

At the end of the day, if you want a long running server, you probably should host it in a data center, maybe with failover and other HA provisions. Home environments are a pain to set up for that. If your computer goes offline and you can't reach it, how do you even know that your home isn't having a power outage? Home ISP's are flaky too, so maybe you want a backup route over mobile data, etc. Yes you can make workarounds for everything but it amounts to turning your home into a crappy low capacity data center.

[-] agentsac@lemmy.world 6 points 8 months ago

PiKVM or a similar device could work for OP - is that what you are thinking of? I've used it and it works well.

I think a lot of people who self-host get caught up in the excitement of getting the services up and running and neglect disaster planning, prevention, and recovery (myself included). Either they put it off for later or don't realize it could be a problem down the road until it happens. We always say not to self host anything you can't live without, and most take that advice, others don't. Not saying OP falls in either category, necessarily, just adding on to some of your points.

Self hosting really is the land of compromise where we all have to balance our requirements, budget, time and effort. Personally, I have a little disposable income that I spend on hardware to host non-critical services so I can learn and tinker. It could all go away and all I will have lost is the time and money I put into it, but I gained some knowledge and enjoyment. Needless to say, I don't have much in the way of backups and monitoring.

[-] solrize@lemmy.world 1 points 8 months ago

PiKVM isn't the board I was thinking of, but same idea, and maybe even better.

[-] sylverstream@lemmy.nz 1 points 8 months ago

Thanks, but a data center is probably overkill for my needs. I've got it power loss protected with a UPS, and that's more than enough for us. Thanks anyway :)

I have a RPI, but of course that one can hang too. I'll buy a simple WiFi smart plug, standalone, as a kill switch.

[-] nightrunner@lemmy.world 2 points 8 months ago

I’m not in front of my computer atm, but I think I have something that can help you out. I have a 3-node Lenovo Thin client cluster that I manage their KVMs using the Intel vPro. I even went a step further using MeshCentral running on a VM to centralize my KVM access since I have 3 of them, but that’s another story.

Anyway, I’ll see if I can grab you some URLs in the morning if someone else doesn’t beat me to it or you find it on your own running google queries.

[-] sylverstream@lemmy.nz 1 points 8 months ago

Thanks mate. It was a bit of a rabbit hole, I found stuff about the watchdog package, and you can configure it to use the iTCO_wdt module, but I also read it was blacklisted, and then I just gave up. I posted somewhere else in the thread what lead up to the hang. And, I think I'll buy a WiFi smartplug so I can remotely reboot everything; assuming the WiFi still works :D

this post was submitted on 22 Feb 2024
65 points (95.8% liked)

Selfhosted

39677 readers
160 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS