For years my homelab has been a rotating cast of hardware — Xeons, Core i7s, SFF boxes — each iteration teaching me something new. My latest build, a Lenovo P3 Ultra with an i9-14900T, introduced a new wrinkle: Intel's P- and E-core architecture.

While experimenting with it, I ended up building a cluster that looks different from the typical enterprise and homelab high-availability setups you see online. I call it the Revolving Cluster.

It's not HA in the enterprise sense, but it gives me instant failover for critical services, fast recovery for VMs, and simplicity — three things I value more in a homelab than strict uptime for everything.

The Philosophy: Revolving, Mostly Redundant and Thoroughly Replicated

Most homelabbers chase HA clusters (Proxmox, Ceph, vSAN, etc.). Those are powerful but bring real complexity: quorum management, shared storage, fencing, migration coordination.

I wanted something with less moving parts and without shared storage. The Revolving Cluster is:

Think of it as two-tier HA: instant for critical services, manual for VMs.

Cluster Topology

Here's the basic layout:

Cluster Hardware

Here's what each node brings to the cluster:

Node CPU RAM Boot Storage Data Storage Network
A (Dell T640) 2× Xeon 8269CY
(52 cores)
1024 GB ECC 2 TB
(EVO 870)
15 TB + 15 TB + 32 TB
(Micron 9400)
4×10G (X710)
2×25G (XXV710)
B (Dell T640) 2× Xeon 8280M
(56 cores)
1024 GB ECC 2 TB
(EVO 870)
15 TB + 15 TB + 32 TB
(Micron 9400)
4×10G (X710)
2×25G (XXV710)
C (Lenovo P3 Ultra) 8P + 8E cores
(i9-14900T)
192 GB ECC 4 TB
(EVO 990)
8 TB + 32 TB
(Micron 9400)
1G + 2.5G
2×25G (XXV710)
D (Lenovo P3 Ultra) 8P + 8E cores
(i9-14900T)
192 GB ECC 4 TB
(EVO 990)
8 TB + 32 TB
(Micron 9400)
1G + 2.5G
2×25G (XXV710)

The T640s are the heavy lifters — they can run the full VM workload including lab environments. The P3 Ultras are compact (3.9L) but still capable of running all critical infrastructure VMs if the T640s go down.

Power note: The T640s idle at around 220W each, so most of the time only one of them is powered on. The second T640 stays off until needed for extra capacity or maintenance failover. The P3 Ultras, idling at a fraction of that, run continuously.

Two Tiers of Service

Not everything in the cluster has the same recovery requirements. I run two tiers with very different RTO/RPO characteristics:

Tier 1: Critical Infrastructure (Sub-Second RTO and RPO)

These services are what the rest of my homelab and home network depend on. All clients use the cluster's VIPs for DNS, file shares, and home directories.

Key design point: stateless services run on all nodes simultaneously, not just the active one:

The VIPs simply direct client traffic to one of these identical services. When a node fails, the VIPs move to another node — which is already running the same services with the same data. Clients see a brief hiccup (sub-second), then continue as if nothing happened.

Tier 2: Virtual Machines (Fast RTO, Longer RPO)

VMs are a different story. They're large, change frequently, and continuous replication would be expensive. Every node in the cluster has a copy of all important VMs — the copy might be a few days old, or newer depending on the last sync. VMs can run on any node, but each VM runs on only one node at a time to avoid split-brain scenarios.

Recovery is manual: if a node fails, I decide which VMs need to be restarted on surviving nodes, and I start them from the local replica — even if it's a few days old. This deliberate manual step prevents two nodes from running the same VM simultaneously.

Design Principle: Separate Compute from Critical Data

The key insight here is that nothing inside a VM should require sub-second replication. If it does, I move that data out of the VM and onto NFS or CIFS — where it gets sub-second replication via lsyncd automatically.

This separation of concerns means:

This two-tier approach gives me the best of both worlds: instant failover with no data loss for critical data, and cost-effective cold standby for VM compute where I accept some staleness in exchange for simplicity.

Hardware Design Principles

Building a shared-nothing cluster with full replicas requires specific hardware considerations:

Storage: Every Node Holds Everything

Each node must have enough storage to hold a complete copy of everything important. The T640s can hold the full workload; the P3 Ultras hold all critical VMs but not the lab environments.

Network: Multiple High-Speed Links

Multiple 10G/25G links per node enable:

Compute: Sized for Worst-Case Failover

Each node must have enough CPU and RAM to run all critical VMs, even if every other node fails. Critical VMs include:

Only the T640s can also run the lab VMs (OpenShift clusters, test environments), but any node can run the critical infrastructure.

Replication Strategy

Replication is the glue that makes the Revolving Cluster work. I use two different tools depending on the workload:

Real-Time Replication: lsyncd

For critical files (home directories, shared files, DNS zones), I use lsyncd — a daemon that watches for filesystem changes and triggers rsync immediately.

Important: lsyncd only runs on the master node (the one holding the VIPs). This is the sole location that determines replication direction — always from the master to all other live nodes.

A Note on Service Group Scaling

Each node-to-node replication relationship requires a VCS service group. Since lsyncd replication is bidirectional (the active direction depends on which node holds the VIPs), you need one service group per unique node pair.

The formula is N × (N-1) / 2 — the classic "N choose 2" combinations formula (also known as the handshake problem):

This scales quadratically, so adding nodes increases complexity quickly. In the VCS screenshot above, you can see this in action: the service groups hasync00 through hasync05 represent the 6 node-pair relationships needed for my 4-node cluster.

These service groups only come online when all of these conditions are true:

Otherwise, the service group stays down. For example, if Node A has the VIPs and Node D is down, only A↔B and A↔C will come up — the three service groups involving Node D remain offline.

Batch Replication: Custom Tooling for VMs

Rather than running rsync directly, I use a custom Python script (rsync_KVM_OS.py) that wraps rsync with VM-aware logic:

Tool Usage

# /shared/kvm0/scripts/rsync_KVM_OS.py --help
usage: rsync_KVM_OS.py [-h] [-c] [-d] [-f] [-p] [-s] [-t] [-u] [--host HOST] [-V] [vm_list ...]

Replicate KVM virtual machines to remote hypervisor

positional arguments:
  vm_list          List of VMs to replicate (default: all configured VMs)

optional arguments:
  -h, --help       show this help message and exit
  -c, --checksum   Force checksumming
  -d, --debug      Debug mode - show commands without executing
  -f, --force      Overwrite even if files are more recent on destination
  -p, --poweroff   Power off remote system when sync is done
  -s, --novxsnap   Don't use vxfs snapshots even if supported
  -t, --test       Don't copy, only perform a check test
  -u, --update     Only update if newer files
  --host HOST      Override destination host (default: auto-detect from script name)
  -V, --version    Show version information and exit

The script handles VXFS snapshots automatically — it creates a snapshot before syncing, copies from the snapshot for consistency, then discards the snapshot when done.

Minimizing Downtime with Snapshots

VXFS snapshots are key to reducing VM downtime during replication. The workflow is:

  1. Shut down the VMs to ensure disk consistency.
  2. Launch the sync script, which takes a snapshot of the cold VMs.
  3. Restart the VMs immediately and continue working.
  4. The copy continues in the background from the snapshot.

This means the targeted VMs are only down for a few minutes — the time it takes for all VMs in the copy batch to shut down cleanly plus the snapshot creation — rather than the hours it would take to copy terabytes of data with VMs offline. VMs not in the batch keep running.

Scheduled Replication

For critical VMs (Active Directory, CA/PKI, IDM), I run scheduled weekly replications during a maintenance window. This ensures these VMs have reasonably fresh copies across all nodes without requiring continuous sync.

VIP Failover and Replication Direction

The magic that makes critical services highly available is the combination of VIPs and directional replication.

In my setup, VIP failover is managed by VCS (Veritas Cluster Server), but the same approach works with any HA software — Pacemaker, Keepalived, or even a simple scripted solution. The key is having something that monitors node health and moves VIPs automatically.

How It Works

  1. One node holds the VIPs — this is the "master" node. All client traffic for DNS, NFS, SMB, Wireguard flows through these VIPs to the master, though all nodes are running these services.
  2. Only the master runs lsyncd — replication always flows outward from the master to all other live nodes.
  3. On failure, VIPs move instantly — another node takes over the VIPs and becomes the new master. Since it was already running the stateless services, clients barely notice the switch.
  4. The new master starts lsyncd — replication direction changes automatically, now flowing from the new master to the remaining nodes.

When a Failed Node Returns

This is where the design really shines. When a failed node comes back online:

  1. It rejoins the cluster as a standby node (not master) and starts its stateless services (DNS, NFS, CIFS, Wireguard).
  2. lsyncd on the current master detects the returning node and begins syncing any files that changed during the outage.
  3. Once caught up, the node is fully operational and ready to take over as master if another failure occurs.

There's no manual "resync" step — the returning node automatically receives everything it missed.

Startup Sequence

On boot, each node starts its stateless services (DNS, NFS, CIFS, Wireguard) immediately. Then:

The initial sync completes quickly (seconds on a 10G network for incremental changes). After that, file replication runs continuously with sub-second latency. This is what enables the near-instant failover for critical services — every node is already serving, and their data is always current.

Why Not Ceph/Gluster/etc.?

Three reasons:

  1. Complexity. I don't want to manage quorum, fencing, and recovery logic.
  2. Overhead. Shared storage layers add latency and resource burn.
  3. Transparency. With rsync, I can literally ls my replication state.

For a homelab, "good enough and understandable" beats "perfect but opaque."

Benefits of This Approach

The tradeoff — longer RPO for VMs — is one I'm willing to accept, especially since critical services have sub-second RPO.

Practical Notes

A few observations from running this setup:

Cost and Storage Efficiency

Let's be honest: this is not cost-effective in the traditional sense. With four nodes, I maintain four copies of all important VMs — plus another cold backup on the NAS. That's a lot of storage for redundancy.

Most of my large NVMe drives (the Micron 9400s) were sourced from eBay or purchased during exceptional one-time deals. Without those opportunities, this setup would be prohibitively expensive.

The Upside of Multiple Copies

Having multiple copies of VMs at different points in time has an unexpected benefit: easy recovery from corruption. If a VM gets corrupted (bad update, ransomware in a test, accidental deletion), I can restore from whichever replica is cleanest — whether that's yesterday's sync or last week's.

Custom Replication Tooling

I couldn't find an existing tool that fit my needs, so I wrote one from scratch in Python: rsync_KVM_OS.py. It handles:

Even with 10G networking and all these optimizations, copying a large 6TB VM still takes hours. There's no magic solution when the data is that big — you just have to wait.

Open Questions

Closing Thoughts

The Revolving Cluster isn't about enterprise HA. It's about homelab HA — with a twist: critical services get real HA (sub-second failover), while VMs get cost-effective cold standby.

The "revolving" nature comes from how replication direction follows the active node. Nodes can fail, return, and fail again — the cluster adapts automatically. The active node is always the source of truth, and standby nodes are always ready to take over.

It's a different way of thinking — instead of always-on everywhere, I aim for always-available services and always-recoverable VMs.

And in a homelab, that's often all you need.

If you're running a similar "shared-nothing" homelab, I'd love to hear how you balance RTO and RPO — and how you've solved the compute vs. critical data separation. Feel free to drop me a line.