Homelabs, am I right?
You might have heard that a homelab is about the gear. It can be. I love buying server hardware, finding deals on refurbished Network Interface Card, hardware that connects a computer or server to a network., and frankensteining a chassis together from parts until it posts. That part is fun. But the reason the lab stays interesting, years in, is everything that happens after the hardware works.
A homelab is a playground where the stakes are low enough to experiment and high enough to learn. You can install a new operating system, break it, blow it away, and start over. With a Software that creates and manages virtual machines on a physical host. like A virtualization platform used to run virtual machines and containers. you can spin up a server in minutes, test something, tear it down, and try a different approach. You can bash your head against a problem for hours, solve it, and walk away knowing that most people would have called support.
My lab also does useful work. I self-host audiobooks so I can listen on my commute to titles I own, Digital Rights Management, access controls that restrict copying, sharing, or playback of digital media., on infrastructure I control. I run Plex so I can tell my kids “go check the server” when they want a movie, and they have some idea that their dad made that possible. My father is a long-distance truck driver, and he calls me every time the server goes down. The books give us something to talk about on his long hauls, and that connection rides on a service I built and maintain in a closet.
I have a A network device or software that enforces rules about which traffic is allowed between zones. and custom Domain Name System, the service that maps names like example.com to network addresses. and An edge service that receives web requests and forwards approved requests to internal services. because I like having control of my network. I keep Internet of Things, network-connected devices such as TVs, sensors, cameras, speakers, and appliances. off my services Virtual LAN, a way to split one physical network into separate logical networks.. I block ads and trackers at the DNS level. I have not seen an ad on my phone in years.
And yes, the lab is where I practice what I preach. I segment my network with VLANs, I maintain a Ports, Protocols, and Services Management (DoDI 8551.01): a DoD process for cataloging, approving, and controlling network ports, protocols, and services. The resulting matrix lists what is allowed and why., I write A written procedure for operating, recovering, or changing a system., and I keep diagrams that another engineer could follow. I built the website you are reading right now on this stack, domain and all, because I wanted a business site I could stand up in hours at no cost. That is the spirit of the lab: useful, documented, mine.
The DNS story
I was running A DNS resolver that performs recursive lookups and caches DNS answers for clients. as my recursive resolver. One morning, DNS across the entire home network failed. Every device, every service, nothing resolving. I suspect a configuration change I had made the night before looked syntactically valid but broke upstream forwarding; cached entries kept working until their Time To Live, a value in DNS records that indicates how long a resolver should cache the record before asking again. expired, and then resolution stopped.
This happened to land on the day of an interview with a The Magnificent Seven: the seven largest U.S. technology companies by market capitalization (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla). company.
I spent a few hours diagnosing the failure, tracing the resolution path, restarting the service, validating upstream connectivity, and getting the household back online. I made the interview on time. The interviewer asked how my day was going. I said “DNS issues. It is always DNS issues.” The interviewer laughed, and the rest of the hour turned into a walkthrough of the incident: what broke, how I triaged it, what I fixed, and what I would do differently. Not a single prepared question. A live outage and recovery told the interviewer more than any whiteboard exercise could.
I got that job. It was a platform engineering role, and I spent a couple of years getting paid to troubleshoot computers, scripts, automation frameworks, cloud tenants, and Microsoft directory service used for identity, authentication, and policy management in Windows environments. at a depth that often led to conversations with the Operating System, the software that manages hardware resources and provides common services for applications. team or an Original Equipment Manufacturer, a company that produces hardware or software products that may be rebranded by another company. because I had found a legitimate bug. Kernel-level troubleshooting, sometimes. I loved it.
The lab did not get me that job by itself. But it gave me a place to build the instincts and the stories that made the interview feel like a conversation between two people who fix things for a living.
What the lab does
The design starts with what the system has to support and who has to live with it. For my lab, the main jobs are:
- Run household services that fade into the background.
- Host experiments in a place where experiments stay contained.
- Publish selected services through a controlled edge.
- Keep a management path alive when the routed network changes.
- Produce diagrams, Ports, Protocols, and Services Management (DoDI 8551.01): a DoD process for cataloging, approving, and controlling network ports, protocols, and services. The resulting matrix lists what is allowed and why. tables, and A written procedure for operating, recovering, or changing a system. that another engineer could follow.
That last job matters. A lab documented only in muscle memory teaches improvisation. A lab with diagrams, decision records, and validation notes teaches design.
The first diagram is trust
Before I care about A smaller IP network carved out of a larger address range., switch ports, or A network device or software that enforces rules about which traffic is allowed between zones. syntax, I care about A group of systems that share similar risk, access, and control expectations.. The first drawing I want is a map of where trust changes and where repair access lives.
flowchart TD
repair([Repair actor · management plane])
subgraph internet ["Internet (untrusted)"]
recon[Reconnaissance<br/>DNS enumeration, Shodan]
dns[Public DNS<br/>CT logs, HTTPS]
recon -.->|enumerates| dns
end
subgraph edge ["Public Edge (semi-trusted)"]
proxy{Reverse proxy<br/>route decision}
deny[Refuse + log<br/>unknown host or path]
proxy -->|refused| deny
end
subgraph services ["Internal Services (trusted)"]
app[Published service<br/>app-layer auth]
infra[Hypervisor · DNS<br/>storage · backups]
logs[(Centralized logs)]
app -->|log event| logs
end
subgraph household ["Household + IoT (constrained)"]
devices[Household and<br/>IoT devices]
end
dns -->|HTTPS| proxy
proxy -->|approved| app
devices -->|media pinhole| app
repair -.->|out-of-band| infra
repair -.->|policy| proxy
repair -.->|reviews| logs
style internet fill:#e8efe8,stroke:#1f3d2c,stroke-width:2px,stroke-dasharray:6 4,color:#1f1d1a
style edge fill:#faf6ef,stroke:#1f3d2c,stroke-width:2px,stroke-dasharray:6 4,color:#1f1d1a
style services fill:#f3ede0,stroke:#948069,stroke-width:2px,stroke-dasharray:6 4,color:#1f1d1a
style household fill:#fff8ed,stroke:#b8783c,stroke-width:2px,stroke-dasharray:6 4,color:#1f1d1a
classDef process fill:#e8efe8,stroke:#1f3d2c,color:#1f1d1a
classDef decision fill:#faf6ef,stroke:#1f3d2c,color:#1f1d1a
classDef reject fill:#fff8ed,stroke:#b8783c,color:#1f1d1a
classDef record fill:#f3ede0,stroke:#948069,color:#1f1d1a
classDef threat fill:#e8efe8,stroke:#b8783c,color:#1f1d1a,stroke-dasharray:4 3
classDef mgmt fill:#fff8ed,stroke:#b8783c,color:#1f1d1a,stroke-dasharray:4 3
class dns,app,devices process
class proxy decision
class deny reject
class logs record
class infra record
class recon threat
class repair mgmt Dashed borders are trust boundaries. Each zone carries a different trust level. The repair actor operates out-of-band, outside all routed zones.
This diagram is a trust model. Each dashed border is a trust boundary; data crossing one of those borders changes privilege level and deserves a firewall rule or proxy decision. The reconnaissance actor shows what the internet already knows about your lab from A public log system where certificate authorities record every TLS certificate they issue, making all public certificates discoverable. and Domain Name System, the service that maps names like example.com to network addresses. records. The repair actor sits outside every routed zone because the moment you lose the management plane, you lose the ability to fix everything else.
Router-on-a-stick, with a lifeline
The lab did not start here. The first routing design was a dedicated An open-source firewall and router platform. box with a dual-port 10 Gbps NIC, one side for Wide Area Network, the upstream or internet-facing side of a network connection. and the other for Local Area Network, the private side of a network used by devices in a home, office, or facility.. It worked well, and I liked how it separated routing from the rest of the infrastructure. Then one of the two ports failed.
It did not fail all at once. Over the course of a few months, the connection would intermittently drop in an almost imperceptible way. I chased the problem through cables, switch ports, and driver settings before discovering the NIC itself was dying. Replacing the fancy dual-port, autonegotiating NIC would not have been cheap, and I had decided that was the end of the experiment. I regretfully retired the pfSense box and switched back to my regular old Google Nest WiFi router for a while. I noodled with the idea of building a new router, or buying a “real” one, and eventually landed on a more interesting solution:
I moved routing into the hypervisor instead of buying another card. An open-source firewall and router platform used here as the lab routing boundary. runs as a Virtual Machine, a software emulation of a physical computer that runs an operating system and applications. on A virtualization platform used to run virtual machines and containers., and the physical network reduces to a single A single network link that carries traffic for multiple VLANs, tagged so each VLAN stays separate. over another 10 Gbps link on the main server. All Virtual LAN, a way to split one physical network into separate logical networks. ride that trunk, and the firewall VM handles inter-VLAN routing. Less hardware to break. One fewer box in the closet.
The trade-off is real: routing now depends on the hypervisor. If Proxmox is down, the network is down. That is a real trade-off, and I chose it anyway. The mitigation is the lifeline: a separate A management path that stays reachable when the main routed network has a problem. that does not depend on the routed network. It might be a console server, a dedicated admin VLAN, a Virtual Private Network, a secure connection method used to access a private network over the internet. that terminates on the hypervisor, or a documented recovery procedure taped to the inside of the closet door. The form changes. The principle stays the same: the repair path belongs beside the routed path, and the design should name both.
The matrix is the explanation layer
Firewall rules are implementation. A Ports, Protocols, and Services Management (DoDI 8551.01): a DoD process for cataloging, approving, and controlling network ports, protocols, and services. The resulting matrix lists what is allowed and why. (PPSM) matrix is the explanation layer.
The matrix should tell a reader why traffic exists before they look at a firewall GUI. This is the kind of shareable version I would put in a design package:
| Source role | Destination role | Protocol | Port | Purpose | Review trigger |
|---|---|---|---|---|---|
| Internet | Public edge | TCP | 80, 443 | Publish selected web services and renew certificates | Any new public DNS name |
| Public edge | Service app | TCP | App port | Send approved requests to the service | New app, new service path, or changed authentication model |
| Admin workstation | Management plane | TCP | Secure Shell, a protocol for encrypted remote access to a system command line., admin UI | Maintain hypervisor, firewall, and switch | New admin user or new management host |
| Service zone | DNS resolver | UDP/TCP | 53 | Resolve internal and external names | Resolver change or new zone |
| Service zone | Internet | TCP | 443 | Package updates, APIs, and outbound integrations | New vendor dependency |
| Household devices | Media service | TCP | Media app port | Local playback and casting support | New media platform or new device class |
| IoT devices | Management controller | TCP | Controller ports | Device adoption and status checks | New controller or wireless redesign |
The working version would have real interfaces, addresses, aliases, and rule IDs. This version shows the discipline: every path has a source, destination, protocol, port, purpose, and review trigger.
That review trigger is the part people skip. It is also the part that keeps a firewall from turning into sediment. If a rule has no trigger, nobody revisits it. If nobody revisits it, the firewall accumulates rules the way a garage accumulates boxes: each one made sense at the time.
A blank row for your own lab or facility:
| Source role | Destination role | Protocol | Port | Purpose | Review trigger |
|---|---|---|---|---|---|
| (your zone) | (target zone) | TCP/UDP | (port) | (why this traffic exists) | (what change would make you revisit this rule) |
Start with one row per firewall rule you can explain. If you find a rule you cannot explain, that is the row that matters most.
Look at that first row again: Internet → Public edge, port 443. If you put a reverse proxy in the public edge, that row is the only inbound WAN rule you need. One port, one destination, and the firewall can default deny everything else inbound. Your Minecraft server, your file shares, your management interfaces: none of them need a hole punched through the firewall because none of them face the internet directly. The proxy is the only thing that does.
That keeps the firewall simple. It also keeps vulnerability management much less severe, because patching one proxy is a different problem than patching every service that used to have its own forwarded port. Adding a new A DNS name under a parent domain, such as app.example.com under example.com. means adding a proxy route, not a new firewall rule.
The reverse proxy is the front porch
The proxy earns its place by being the decision point that A router or firewall rule that sends traffic arriving on an external port to a specific internal host and port. skips. It terminates Transport Layer Security, the protocol that encrypts HTTPS and other network connections. with Let’s Encrypt certificates, matches each request’s hostname against the Subject Alternative Name, a certificate extension that lists the hostnames a single TLS certificate is valid for. on the cert, and makes a routing choice: forward to the right internal service, or refuse and log the attempt. Access logs live in one place.
flowchart LR
public[Public DNS name] --> proxy{Reverse proxy<br/>route decision}
proxy -->|approved| app[Service app<br/>with auth]
proxy -->|unknown host<br/>or path| refuse[Refuse]
app --> logs[Edge logs<br/>source, route, status]
refuse --> logs
logs --> review[PPSM review<br/>DNS, auth, logging]
classDef path fill:#e8efe8,stroke:#1f3d2c,color:#1f1d1a
classDef decision fill:#faf6ef,stroke:#1f3d2c,color:#1f1d1a
classDef reject fill:#fff8ed,stroke:#b8783c,color:#1f1d1a
classDef record fill:#f3ede0,stroke:#948069,color:#1f1d1a
class public,app path
class proxy decision
class refuse reject
class logs,review record A public DNS name is discoverable. The proxy decision should be simple to explain: route the approved request, refuse the unknown one, and leave evidence.
During a routine audit of my own lab, I found that the reverse proxy was routing public traffic to management interfaces, including the Software that creates and manages virtual machines on a physical host. login page. Every subdomain with a certificate appeared in A public log system where certificate authorities record every TLS certificate they issue, making all public certificates discoverable. logs, and The process of discovering all hostnames registered under a domain, often using public records and automated tools. tools could list them all. I had set up the proxy for convenience, and convenience is what the internet got too.
That audit changed the design:
- Management interfaces moved to internal-only access paths.
- Public services carry their own authentication and logging.
- The proxy routes a narrow set of named service paths.
- Edge logs show what crossed the boundary.
- The PPSM gained a review trigger for every new public DNS name.
If a public DNS name has a certificate, certificate transparency makes it discoverable. Security architecture starts when the design names what crosses the edge, who can use it, and how anyone will know.
Storage is where trade-offs get real
You may have heard the 3-2-1 backup rule: three copies, two media types, one offsite. It is good advice. I follow it for the data that matters. And I break it, on purpose, for the data that does not.
The media array in this lab is a striped multi-disk pool with no redundancy. No mirror, no parity, no second copy. A drive failure takes the pool with it. I chose that layout because hard drives are expensive and the only thing on that array is audiobooks and movies for my family. Paw Patrol is important to my six-year-old daughter, but it is not critical data. Every title can be re-ripped or re-downloaded. More disk space, accepted risk, and a clear-eyed decision about what “loss” means for this workload.
That is the kind of trade-off that separates a design from a shopping list. The interesting question is not which Redundant Array of Independent Disks, storage layouts that combine multiple drives for performance, redundancy, or both. to pick. It is what happens when a drive fails, and whether the answer is something you can live with.
The irreplaceable data gets treated differently. Configurations, service data, databases, keys, diagrams, and A written procedure for operating, recovering, or changing a system. are the things that make the system mine. Those live on a separate pool with redundancy, automated A point-in-time copy of data or system state used for rollback, recovery, or comparison., and offsite backups. The decision record names the risk I accepted and the mitigations that make it tolerable:
- Media pool: striped, no redundancy, re-downloadable content. Accepted risk.
- System pool: redundant, snapshotted, backed up. Recovery path documented.
- Monitor disk health and capacity on both.
- Revisit the design when the hardware budget or workload changes.
That is the kind of A structured comparison of design options that names the choices, trade-offs, and reasons for a decision. I want in a client handoff. A storage decision should say more than what to buy. It should name the choice, the failure mode, and what happens when something breaks.
What this says about how I design
The lab has been through a dead NIC, a DNS outage on interview day, a striped array I chose to let fail, and a reverse proxy that was handing management interfaces to the internet. Every one of those stories started with a decision I made and ended with a diagram, a table, or a runbook that made the next decision easier.
That is the practice. Draw the trust boundaries before you pick the hardware. Write the PPSM row before the firewall rule. Keep a repair path that works when the thing you are repairing is the network itself. And write down the trade-off while the reasons are still fresh, because six months from now you will be a different engineer staring at your own work.
I built this website on the same stack, domain and all, because I wanted proof that the lab produces real things. The next piece will walk through how that works: the build, the deployment, the monitoring, and the part where my six-year-old’s Paw Patrol library and my dad’s audiobooks ride the same infrastructure as the site you are reading now.