18
submitted 1 year ago by Ac5000@lemm.ee to c/selfhosted@lemmy.world

Sorry for the wall of text... This ended up a lot longer than I thought it would...

TL:DR - Looking for guide to partitioning/booting and/or help with Clover config.

Background

I recently purchased a used Dell PowerEdge R730xd to use as a home lab/self-hosting project. The intention being I would install Proxmox and play around with it and see what I wanted to add in to it later. As the server did not include any drives, I figured I would purchase a PCIe to NVMe adapter to work as the "boot" drives for the system and then fill up the 24 drive bays over time if I decided I wanted to continue with the setup.

I purchased one of the Asus Hyper M.2 x16 PCIe NVMe cards that supports up to 4 drives. To go along with it, I purchased 2x 1TB Samsung 980 Pros. I had done some research ahead of time knowing this might cause some issues, but it appeared that they could be worked through.

Installation

I installed the drives and card and turned on PCIe bifurcation for the slot. The server/iDRAC didn't see the devices, but this was expected based on prior research.

Using Dell's iDRAC, I was able to virtually attach the Proxmox .iso and boot into the installer just fine. For my Proxmox install, I chose to use "zfs (RAID1)" with both 980's as the drives. Installation appeared to go through without a problem and I rebooted to finalize the install.

At this point, the server does not recognize a boot option and hangs in the POST menu asking what to do.

Problem and Possible Solution

I was aware this might be an issue. From what I've gathered, the server won't boot because of them being NVMe in the PCIe slots. Plus the fact that they don't even appear in iDRAC or BIOs confirms this.

I had discovered this is a common issue and that people suggest using Clover as a way to "jump start" the boot process.

I found this guide where someone appears to have gone through a very similar process (although for VMware ESXi) that seemed to have enough clues to what I'd need to do.

I installed Clover to a flash drive and did the steps to move in the nvme drivers, booted into Clover, and created the "preboot.log" file. I then started to edit/create the config.plist file as they described in the guide. This is the stage where I ran into problems...

Troubleshooting and Where I Need Help

When I opened the preboot.log file and did the search for "nvme", I found multiple listings. (Copy of the preboot section below for reference.) This is where my understanding of things starts to run out and I need help.

There are 8x volumes with NVMe being referenced. (The USB listings I assume are from the Clover boot media.) Just looking at the numbers, I think this means there are 4 partitions per physical drive? I assume that the RAID1 install means things are duplicated between the 2 drives.

I did some more research and found this guide on the Proxmox forums. They mention starting into the Proxmox installer and doing a debug install to run fdisk and blkid to get the PARTUUID. The second post mentions a situation that sounded exactly like mine and provided a config file with some additional options.

I got into the debug menu and ran fdisk and blkid (results copied below). This again is where I struggle to understand what I am seeing because of my lack of understanding of file-structures/partitioning/boot records.

The Request(s)

What I was hoping to find out from this post was a few things.

  1. Can someone explain the different pieces of information from the fdisk and blkid commands and preboot.log? I've done some work with fixing my other Linux server in the past and remember seeing some of this, but I never fully "learned" what I was seeing. If someone has a link that explains the columns, labels, under-lying concepts, etc, that'd be great! I wasn't able to find one and I think it's because I don't know enough to even form a good query...
  2. Hopefully someone out there has experienced this problem and can look at what I've got and tell me what I've done wrong. I feel like I am close, but just missing/not understanding something. I fully assume I've either used the incorrect volume keys for my config, or something else in the config file. I'm leaning on the former, hence point 1.
  3. If anyone has a "better" way to get Proxmox to boot with my current hardware, I'd like to hear it. My plan was to get Clover working and install that on the vFlash card in the server and just have that jump start the boot on a reboot.
  4. Hopefully this can serve as a guide/help someone else out there.

Let me know if you need more information. I am posting this kind of late so I might not get back to your question(s) until tomorrow.

fdisk

(Please note that I had to manually type this as I only had a screenshot that I couldn't get to upload. There might be typos.)

fdisk -l
Disk /dev/nvme0n1: 932GB, 1000204886016 bytes, 1953525168 sectors
121126 cylinders, 256 heads, 63 sectors/track
Units: sectors of 1 * 512 = 512 bytes

Device		Boot	StartCHS	EndCHS		StartLBA	    EndLBA	   Sectors	Size	ID	Type
/dev/nvme0n1p1		0,0,2		1023,255,63		1	1953525167	1953525167	931G	ee	EFI GPT
Disk /dev/nvme1n1: 932GB, 1000204886016 bytes, 1953525168 sectors
121126 cylinders, 256 heads, 63 sectors/track
Units: sectors of 1 * 512 = 512 bytes

Device		Boot	StartCHS	EndCHS		StartLBA	    EndLBA	   Sectors	Size	ID	Type
/dev/nvme1n1p1		0,0,2		1023,255,63		1	1953525167	1953525167	931G	ee	EFI GPT

blkid

(Please note that I had to manually type this as I only had a screenshot that I couldn’t get to upload. There might be typos.)

blkid
/dev/loop1: TYPE="squashfs"
/dev/nvme0n1p3: LABEL="rpool" UUID="3906746074802172538" UUID_SUB="7826638652184430782" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="c182c6d2-6abb-40f7-a204-967a2b6029cc"
/dev/nvme0n1p2: UUID="63F3-E64B" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="06fc76a4-ed48-4f0e-84ed-f602f5962051"
/dev/sr0: BLOCK_SIZE="2048" UUID="2023-06-22-14-56-03-00" LABEL="PVE" TYPE="iso96660" PTTYPE="PMBR"
/dev/loop0: TYPE="squashfs"
/dev/nvme1n1p2: UUID="63F6-0CF7" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="8231936a-7b2c-4a96-97d6-b80393a3e7a1"
/dev/nvme1n1p3: LABEL="rpool" UUID="3906746074802172538" UUID_SUB="11940256894351019100" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="f57fc276-bca6-4779-a161-ebe79db3275e"
/dev/nvme0n1p1: PARTUUID="7c249bb3-b7fb-4ebf-a5ae-8d3b9b4b9ab5"
/dev/nvme1n1p1: PARTUUID="0a796a75-41a4-4f57-9c1f-97817bb30963"

preboot.log

117:268  0:000  === [ ScanVolumes ] =============================
117:268  0:000  Found 11 volumes with blockIO
117:268  0:000  - [00]: Volume: PciRoot(0x0)\Pci(0x1A,0x0)\USB(0x0,0x0)\USB(0x4,0x0)\USB(0x0,0x0)
117:273  0:005          Result of bootcode detection: bootable unknown (legacy)
117:273  0:000  - [01]: Volume: PciRoot(0x0)\Pci(0x1A,0x0)\USB(0x0,0x0)\USB(0x4,0x0)\USB(0x0,0x0)\HD(1,MBR,0x3522AA59,0x3F,0x64000)
117:276  0:003          Result of bootcode detection: bootable unknown (legacy)
117:276  0:000            label : BDU
117:276  0:000          This is SelfVolume !!
117:276  0:000  - [02]: Volume: PciRoot(0x0)\Pci(0x1A,0x0)\USB(0x0,0x0)\USB(0x4,0x0)\USB(0x0,0x0)\HD(2,MBR,0x3522AA59,0x6403F,0x70CFC1)
117:280  0:003          Result of bootcode detection: bootable unknown (legacy)
117:280  0:000  - [03]: Volume: PciRoot(0x1)\Pci(0x2,0x0)\Pci(0x0,0x0)\NVMe(0x1,BD-15-A3-31-B6-38-25-00)
117:280  0:000          Result of bootcode detection: bootable Linux (grub,linux)
117:280  0:000  - [04]: Volume: PciRoot(0x1)\Pci(0x2,0x0)\Pci(0x0,0x0)\NVMe(0x1,BD-15-A3-31-B6-38-25-00)\HD(1,GPT,7C249BB3-B7FB-4EBF-A5AE-8D3B9B4B9AB5,0x22,0x7DE)
117:280  0:000          Result of bootcode detection: bootable unknown (legacy)
117:280  0:000  - [05]: Volume: PciRoot(0x1)\Pci(0x2,0x0)\Pci(0x0,0x0)\NVMe(0x1,BD-15-A3-31-B6-38-25-00)\HD(2,GPT,06FC76A4-ED48-4F0E-84ED-F602F5962051,0x800,0x200000)
117:281  0:000          Result of bootcode detection: bootable unknown (legacy)
117:283  0:002            label : EFI
117:283  0:000  - [06]: Volume: PciRoot(0x1)\Pci(0x2,0x0)\Pci(0x0,0x0)\NVMe(0x1,BD-15-A3-31-B6-38-25-00)\HD(3,GPT,C182C6D2-6ABB-40F7-A204-967A2B6029CC,0x200800,0x7450658F)
117:283  0:000  - [07]: Volume: PciRoot(0x1)\Pci(0x2,0x1)\Pci(0x0,0x0)\NVMe(0x1,F1-1B-A3-31-B6-38-25-00)
117:283  0:000          Result of bootcode detection: bootable Linux (grub,linux)
117:283  0:000  - [08]: Volume: PciRoot(0x1)\Pci(0x2,0x1)\Pci(0x0,0x0)\NVMe(0x1,F1-1B-A3-31-B6-38-25-00)\HD(1,GPT,0A796A75-41A4-4F57-9C1F-97817BB30963,0x22,0x7DE)
117:283  0:000          Result of bootcode detection: bootable unknown (legacy)
117:283  0:000  - [09]: Volume: PciRoot(0x1)\Pci(0x2,0x1)\Pci(0x0,0x0)\NVMe(0x1,F1-1B-A3-31-B6-38-25-00)\HD(2,GPT,8231936A-7B2C-4A96-97D6-B80393A3E7A1,0x800,0x200000)
117:283  0:000          Result of bootcode detection: bootable unknown (legacy)
117:286  0:002            label : EFI
117:286  0:000  - [10]: Volume: PciRoot(0x1)\Pci(0x2,0x1)\Pci(0x0,0x0)\NVMe(0x1,F1-1B-A3-31-B6-38-25-00)\HD(3,GPT,F57FC276-BCA6-4779-A161-EBE79DB3275E,0x200800,0x7450658F)

config.plist





  Boot
  
    Timeout
    5
    DefaultVolume
    LastBootedVolume
  
  GUI
  
    Custom
    
      Entries
      
        
          Path
          \EFI\systemd\systemd-bootx64.efi
          Title
          ProxMox
          Type
          Linux
          Volume
          06FC76A4-ED48-4F0E-84ED-F602F5962051
          VolumeType
          Internal
        
        
          Path
          \EFI\systemd\systemd-bootx64.efi
          Title
          ProxMox
          Type
          Linux
          Volume
          8231936A-7B2C-4A96-97D6-B80393A3E7A1
          VolumeType
          Internal
        
      
    
  




top 18 comments
sorted by: hot top controversial new old
[-] scrapeus@feddit.de 6 points 1 year ago

Hey, i have had the same trouble on an DL380 G9. Those bioses don't support booting from PCIe at all. My server can't even boot from drives from the Raid controller in IT-Mode.

I would suggest, by proxmox being a hypervisor, to just install proxmox on a single SATA disk and try to boot from there. This is what I have done in the end.

You can then use your NVMes as storage pool. Also you bifurcation can always also be a problem when trying to boot from those devices.

I would also as a last call try to disable bifurcation and see if one drive will show up. Maybe then you could use 2 real PCIe slots with cheap m2 to PCIe adapters.

[-] BlueEther@no.lastname.nz 4 points 1 year ago

a ssd sata drive in a cd-drive caddy converter is a good way to get a boot drive as well

[-] Ac5000@lemm.ee 1 points 1 year ago

The server has 24x 2.5" bays. I have an old SSD drive that I figure I could use as a last resort to be the Proxmox boot drive and then just use the NVMe's as storage.

I was just hoping to have the Proxmox install/configuration in the NVMe RAID1 just for some minor safety in case a drive dies. From what I've read, this should be possible. I'm just lacking the knowledge to know what I've done wrong. (Mostly my lack of understanding the blkid results.)

DL380 G9. Those bioses don't support booting from PCIe at all.

They actually do but it can only be a HPE supported BootROM... anything non-HPE is ignored (weirdly, some Intel and Broadcom cards PXE boot without the HPE firmware but not all).

Most of these boards have internal USB and internal SD slots which you can boot from with any media, intact HPE sell a USB SD card raid adaptor for the usb slot. So I would recommend using SD card for this...

[-] scrapeus@feddit.de 2 points 1 year ago

I wouldn't suggest usb or sd-cards with proxmox due to its constant logging. You will fry them really quick unfortunately. Had that problem with NVMes.

For litterly anything else I would also suggest SD-Cards.

[-] Ac5000@lemm.ee 1 points 1 year ago

The original plan is to use an SD card with Clover in read-only mode to bootload Proxmox running on the NVMe drives. (Read-only to prevent frying the SD card) This server has a built in SD Card slot Dell calls "vFlash" that you can actually remotely partition and configure. That's where I was going to put the final configuration of Clover.

How fast/often is Proxmox writing logging? It's concerning that you say you had this fry some NVMes since that's what I'm trying to do here. Is this a setting that you can adjust?

[-] scrapeus@feddit.de 2 points 1 year ago

The problem is more with zfs on consumer grade NVMes. I have/had problems in that configuration due to the bigger sector sizes. Proxmox itself does do frequent writes, but I don't know how often exactly. I know that my problems went away with not using zfs.

https://www.reddit.com/r/Proxmox/comments/idlqh3/zfs_extremely_high_ssd_wearout_seemingly_random/

[-] Ac5000@lemm.ee 2 points 1 year ago

Thank you for the details and link.

I looked around a little and seems like there are settings to help avoid this problem. Letting me know about this problem makes sure I catch it early. Unlike some of the people I've found that didn't see the problem until it was already pretty bad...

I'll keep this in mind if I can ever get this to work.

[-] Ac5000@lemm.ee 1 points 1 year ago

From what I've read online, Dell does something similar. There's some sort of card/add-on that can enable directly seeing and booting from PCIe but they are costly.

This server has the internal USB and a build in SD slot accessible from the rear. (There's also a dual card option like you mention for redundancy.)

My plan was to get Clover working with USB, then use the vFlash SD slot to hold the Clover bootloader in read-only mode. This would hopefully prevent the SD card from dying quickly.

[-] fuggadihere@lemmy.world 3 points 1 year ago

I was about to suggest scrapeus’ solution but only try boomg off a single nvme, since the zfs mirror adds another layer of complexity. At least you will be able to rule the zfs mirror root out of the equation.

[-] Ac5000@lemm.ee 1 points 1 year ago

Prior to doing the Proxmox install, and prior to the PCIe bifurcation, I still was unable to see the drives directly in iDRAC/bios. What I've read online is Dell does this for "reasons" and they happen to sell an add-on card to let you directly access NVMe from PCIe.

While I'm not ruling out the ZFS mirror issue, I don't think it's the cause of my problem considering both Clover and the Proxmox install debug can see the drives/partitions. I just don't understand partition/device/boot structures and processes enough to make sense of what I'm seeing in the blkid/preboot results.

Trying to find information about it online just gets me bad guides about making partitions. The Linux docs for blkid and fdisk also don't seem to have the explanation of the results, just the arguments for the commands.

[-] Ac5000@lemm.ee 1 points 1 year ago

I don't think the bifurcation is causing me issues. Before I enabled it, I wasn't able to see the drives from iDRAC/Bios. From what I've been able to research, this is expected and Dell sells the "solution" to booting directly from them. (Add-in card that's pretty pricey...)

I do have an old SATA SSD that I'm considering slotting into one of the bays and using to boot. But I see that as a "last resort" option. I was hoping to have a bit of redundancy with the Proxmox install/configuration itself.

I feel that there's a solution to the current setup and I just lack the knowledge to fix it. Everything I've been able to find points to my current setup being able to work. I'm just being hindered by not understanding partition/device/boot structure.

From what I understand, and what I saw during the Proxmox installation, if I can get past whatever part of the POST/boot process is preventing seeing the drives directly, I can use Clover to bootload from there. I've been able to boot into Clover just fine, and it was able to "see" the drives and partitions. I just don't know which one should hold the Proxmox boot and if I've configured the Clover config correct.

[-] BlueEther@no.lastname.nz 3 points 1 year ago

I had the same issue on an IBM m4 (with out bifurcation), I have tried with Clover and had limited success. What I'm doing at the moment is I have proxmox installed to a single nvme and the same iso installed to a thumb drive.

I then, on first boot, remapped the boot pool to point to the nvme drive. Down side is I have to update the kernel on both the thumdrive and nvme drive if I need to update the kernel version

[-] Ac5000@lemm.ee 1 points 1 year ago

What were the issues you had with Clover in particular? I'd be interested to hear since I'm trying to head down that path myself.

For your "remap" can you explain what you did/have an example? I think this might give me the knowledge I'm lacking since I think part of my problem is not understanding which partition/PARTUUID is the Proxmox boot/what I should point Clover at.

[-] BlueEther@no.lastname.nz 2 points 1 year ago* (last edited 1 year ago)

I cant remember what the issue was sorry

If you go down the route of booting off a usb key to chain load proxmox there will be no real ware to the usb as it's only used for booting.

On thye first boot it proxmox will complain that there are two zvol with the same name, and you just rename the one on the USB to ***_USB or the like and then continue the boot - I think I did a post on reddit or proxmox forum on how to do this

Edit, found it https://www.reddit.com/r/Proxmox/comments/ybxcxx/using_clover_to_boot_of_my_nvme_drive/

I have a M4 that I did this for:

Install proxmox to NVME as ZFS

Install proxmox to USB as ZFS

On first boot change the USB ZFS rootfs to a new name (this is from the top of my head but is backed up by thegeekdiary.com):

zpool import // find the ID of the NVME pool

zpool import -f NNNNNNNNNNNN

zpool import rootfs rootfsusb

zpool export rootfsusb

reboot
[-] Ac5000@lemm.ee 1 points 1 year ago

Thank you for responding and providing the link and info. The top comment in that reddit post has the same link I posted above.

For the

zpool import // find the ID of the NVME pool

How did you find the ID of the NVME pool? I think is part of the problem I have where I see multiple partitions and not entirely sure which is the "boot" partition I should be pointing to. I think in your case, you're pointing to the "data" partition, but this might help me eliminate one of my options.

I'm also not sure how the raid1 plays into things since it seems like both physical drives seem to have the same partitions. Not sure if I can just point to one of the "boot" partitions on one of the drives and it'll find it's partner when it starts booting?

[-] BlueEther@no.lastname.nz 2 points 1 year ago

the command zpool import will list all the pools and IDs

[-] Decronym@lemmy.decronym.xyz 2 points 1 year ago* (last edited 1 year ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NVMe Non-Volatile Memory Express interface for mass storage
PCIe Peripheral Component Interconnect Express
SATA Serial AT Attachment interface for mass storage
SSD Solid State Drive mass storage

4 acronyms in this thread; the most compressed thread commented on today has 15 acronyms.

[Thread #167 for this sub, first seen 27th Sep 2023, 09:05] [FAQ] [Full list] [Contact] [Source code]

this post was submitted on 27 Sep 2023
18 points (90.9% liked)

Selfhosted

39677 readers
433 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS