Skipping IOMMU aspect of Rebuild process

This time I’m going to try skipping the IOMMU aspect of this rebuild process because it was honestly such a nightmare the first time I built the server. 

In that particular situation, I didn’t have everything built into the server from the get-go, and nothing in the network was depending on it’s presence yet.  I added pfSense next, and that took a while to get it to the point where the config worked for me, and I had the main wireless router performing all of my DHCP.   I didn’t “cut” everything over to pfSense and Pihole until months after I had installed them.  But this time, I was under a little more pressure to restore things to the way that they were back when we had Proxmox and pfSense/Pihole/TrueNAS working for us.  So just to simplify my life, I decided not to employ IOMMU this time around.

What is IOMMU you ask?  It’s short for Input/Output Memory Management Unit, and is a hardware feature that sits between your system’s memory and I/O devices (like GPUs, NICs, or storage controllers), and it plays a crucial role in:

Memory Protection
•     Prevents devices from accessing unauthorized memory regions
•     Shields the system from faulty or malicious DMA (Direct Memory Access) operations
Address Translation
•     Maps device-visible virtual addresses to physical memory addresses
•     Enables devices to access non-contiguous memory as if it were contiguous (scatter/gather)
️ Virtualization Support
•     Essential for PCI passthrough in Proxmox and other hypervisors
•     Allows guest VMs to use real hardware securely and efficiently

I decided this time I was going to only rely on native Proxmox systems to get things up and working.  Sadly, the use of IOMMU when I first provisioned the server were exceedingly difficult to find or research on the web, because it was the proverbial “wild-west”, and for every post I could document that said to do something using Method-A, I found at least 10 other posts that said Method-A was entirely wrong.  So even with the best of tools at my disposal, it was for me a veritable nightmare.  Mostly it’s about the fact that if you do select the wrong method, you could literally destroy your hardware.

So this time, going to take it nice and slow to try to keep my sanity.  Let me know if you had similar problems employing IOMMU.

Manual rebuild of Proxmox

So once the server failed to boot, I lost access to everything that I had stored on that box.  Classic case of keeping all of my eggs in one basket.  No one to blame but myself. 

Thankfully, I was able to pivot to using the local telco’s “Residential Gateway” as my DHCP Server, and had the network back up and working again within 30 minutes.  Obviously, no access to our personal shares on the NAS, no blocking of Ads via PiHole, and worse, now our personal data gathered by the local telco on our usage and DNS

I’ve been giving it a lot of thought since the SSD in the server went kaput last month, and decided that I was going to approach this build a little differently.  Not much mind you, but enough to hopefully keep this kind of a problem from happing to me again. I’m not going to deal with the whole IOMMU config at this time (I’ll expound on that in another post), and a few other minor changes.  I’ve already installed the base Proxmox OS (using EXT4 FilesSystem of course!), and will be starting to install pfSense and PiHole with it’s “Recursive DNS” option in the next day or so.

Going to spend the next few weeks fine-tuning things…

So it’s official. I killed SSD w/ZFS FileSystem choice.

So today I had to power down the server to perform some maintenance.  It was also the day that my local electric utility decided to replace a pole in my neighborhood.  They were kind enough to warn us that power would be down, so I took the opportunity to power down the server in the morning before they began their work.  I went to work, and when I got home, I did some moving around of equipment around the Server that I had been wanting to do for a while.  Unfortunately, when I powered the server back on, it started-up just fine but threw the dreaded “no bootdrive detected”.  

SSD’s by their very nature are perfect for most every situation, EXCEPT WRITE INTENSIVE applications.  Turns out that I was NOT paying attention when I originally installed Proxmox on the SSD with ZFS format.  Or more specifically, I wasn’t aware of the bad choice I was making. I should have gone with EXT4 as the type of FileSystem instead of ZFS.  Bad sysadmin!

Errors are really starting to stack-up

So those errors are really starting to stack-up on that SSD. 

Cracked open the chassis for like the third time this week, not really expecting to find a physical problem, most because I don’t think it’s a hardware problem.  I suspect that it’s more likely a configuration choice / issue on my part.   I bought a replacement drive just in case this one goes down hard.  It’s now up to:

Device: /dev/sda [SAT], ATA error count increased from 175188 to 175201