WireGuard Plugin
The WireGuard plugin provides automated, multi-tenant mesh VPN networking for Umoo-managed devices. Each device group can be enrolled in a dedicated WireGuard network. The platform handles keypair generation, IP allocation, NAT traversal via STUN, and per-device config distribution — with no manual wg tooling required on the server side.
Table of Contents
- Architecture Overview
- IP Address Allocation
- Agent-Side Plugin
- Server-Side Plugin
- Bus Message Protocol
- ConnectRPC API
- Tenant Configuration
- Operational Notes
Architecture Overview
The plugin is split across two processes that communicate exclusively via the message bus:
┌────────────────────────────────────────────────────────────────────────┐
│ Backend (umoo server) │
│ │
│ ┌──────────────────────┐ ┌───────────────────────────────────┐ │
│ │ WireGuardService │ │ AdminService RPCs │ │
│ │ ─────────────────── │ │ ─────────────────────────────── │ │
│ │ • CreateMeshNetwork │◄────│ CreateWireGuardNetwork │ │
│ │ • IP allocation │ │ EnrollGroupInNetwork │ │
│ │ • Peer registration │ │ AddDeviceToNetwork │ │
│ │ • Config building │ │ SetPeerRole / AssignRelay │ │
│ │ • Distribution │ │ ListWireGuardPeers │ │
│ └──────────┬───────────┘ └───────────────────────────────────┘ │
│ │ NATS bus │
└─────────────┼──────────────────────────────────────────────────────────┘
│
╔═════════╧═════════╗
║ Message Bus ║
║ (NATS / local) ║
╚═════════╤═════════╝
│
┌─────────────┼──────────────────────────────────────────────────────────┐
│ Device Agent (umoo-agent) │ │
│ │ │
│ ┌──────────────────────────────────▼──────────────────────────────┐ │
│ │ WireGuardPlugin (agent) │ │
│ │ ────────────────────────────────────────────────────────── │ │
│ │ • Subscribes: cmd.plugin.wireguard.join │ │
│ │ cmd.plugin.wireguard.full_configs │ │
│ │ • Publishes: evt.plugin.wireguard.pubkey (key + endpoint) │ │
│ │ evt.plugin.wireguard.endpoint (STUN result) │ │
│ │ • Manages: wg0, wg1, … (one per enrolled network) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘One WireGuard interface per network per device. If a device belongs to three group networks, it runs three independent wgN interfaces, each with its own keys, IP, and peer list.
IP Address Allocation
IPAllocator (internal/backend/plugins/wireguard/ip_allocator.go) manages per-network IP pools entirely in memory, with sequential allocation and explicit release.
How it works
CIDR "100.64.1.0/24"
├── .0 reserved (network address, skipped)
├── .1 → first Allocate() call
├── .2 → second
│ ...
├── .254 → 254th
└── .255 reserved (broadcast, skipped)- Sequential, not random —
nextHostcounter advances and wraps. This makes peer IPs predictable during development but the counter resets on server restart. - Restart caveat: allocators are in-memory only. On restart,
PreloadAllocatoris called per network from its storedsubnet_cidr. Allocated IPs are not replayed from the DB — existing peers keep theirallocated_ipcolumn value, but the in-memory counter starts at.1again. If a new peer joins a network that already has.1taken, the allocator will hand out.1again and theUpsertPeercall will conflict. Production fix:PreloadAllocatorshould pre-mark all existing peer IPs as allocated by queryingwg_peers. - Minimum subnet:
/30(2 usable hosts). The allocator rejects anything tighter. - IPv6: not supported — only IPv4 CIDRs accepted.
Default CIDR
The recommended default is 100.64.0.0/10 (RFC 6598 Shared Address Space). This range is:
- Not routable on the public internet
- Not used by home routers (
192.168.x) or cloud VPCs (10.x) - Not the Docker bridge range (
172.17.0.0/16)
Per-group networks should carve out /24 slices from within this range, e.g.:
Group A → 100.64.1.0/24 (254 devices)
Group B → 100.64.2.0/24
...
Group N → 100.64.255.0/24Agent-Side Plugin
Source: internal/agent/plugins/wireguard/
The agent plugin manages a set of networkState entries (one per enrolled network), each mapped to a WireGuard interface (wg0, wg1, …). Interface names are assigned sequentially from an in-process counter — wg0 for the first network, wg1 for the second, and so on.
Platform Implementation (Go Libraries)
The WGCommandRunner interface abstracts all platform operations. The production implementation (nativeWGRunner) uses Go libraries instead of shelling out to wg/wg-quick:
| Operation | Old (CLI) | New (Go library) |
|---|---|---|
| Key generation | wg genkey / wg pubkey | wgtypes.GeneratePrivateKey() |
| Peer config / crypto | wg set <iface> peer ... | wgctrl.Client.ConfigureDevice() |
| Interface stats | wg show <iface> dump | wgctrl.Client.Device() |
| Interface creation (Linux) | wg-quick up | netlink.LinkAdd(&GenericLink{LinkType: "wireguard"}) |
| IP assignment (Linux) | ip addr add via wg-quick | netlink.AddrAdd() |
| Route management (Linux) | ip route add via wg-quick | netlink.RouteAdd() |
| Interface teardown (Linux) | wg-quick down | netlink.LinkDel() (auto-removes routes) |
| Interface creation (macOS) | wg-quick up | wireguard-go <iface> subprocess |
| IP assignment (macOS) | ifconfig via wg-quick | exec ifconfig |
| Route management (macOS) | route add via wg-quick | exec route -q -n add -inet |
| IP forwarding | sysctl | sysctl (unchanged — OS tool, not WireGuard-specific) |
| NAT masquerade (Linux) | iptables | iptables (unchanged) |
| NAT masquerade (macOS) | iptables | stub + warning (use pf manually) |
Build tags enforce platform separation:
wg_runner_common.go— all platforms (wgctrl calls, key ops, INI parser)wg_runner_linux.go—//go:build linux(netlink)wg_runner_darwin.go—//go:build darwin(wireguard-go, ifconfig, route)
WireGuard Config Lifecycle
When the server pushes a full_configs command, the agent executes:
1. WriteConfig("/etc/wireguard/wg0.conf", content)
└── Parses INI content into parsedWGConfig
Caches it in nativeWGRunner.pending["wg0"]
(No file written to disk)
2. QuickDown("wg0") ← if interface was already running
└── netlink.LinkDel / kill wireguard-go PID
3. QuickUp("wg0")
└── [Linux] netlink.LinkAdd (wireguard type)
netlink.AddrAdd (10.0.0.5/24)
netlink.LinkSetUp
wgctrl.ConfigureDevice (private key, listen port, peers)
netlink.RouteAdd per peer AllowedIP
└── [macOS] wireguard-go wg0 (creates utun)
sleep 100ms
ifconfig wg0 <ip> <ip> netmask <mask> broadcast <bcast>
ifconfig wg0 up
wgctrl.ConfigureDevice
route add per peer AllowedIPFull-tunnel routes (0.0.0.0/0) are skipped with a warning. The server does not send these in normal mesh configs — only explicit peer /32 routes and egress subnets are distributed.
STUN NAT Traversal
After applying each network config, the plugin spawns a goroutine to probe STUN servers:
go p.probeAndReportSTUN(nc.NetworkID, nc.StunServers)ProbeSTUN iterates the configured STUN servers (falling back to Google/Cloudflare STUN if none given) and sends an RFC 5389 Binding Request over UDP. On success, it parses the XOR-MAPPED-ADDRESS attribute to get the device's public IP:port.
The result is published as evt.plugin.wireguard.pubkey with the endpoint included. The server then updates the peer's endpoint column and triggers a network-wide config redistribution so all peers get the new direct route to this device.
STUN is opportunistic: if all probes fail, the device enters relay-only mode and the endpoint field stays empty. Devices without endpoints can still communicate if a relay peer is assigned.
Server-Side Plugin
Source: internal/backend/plugins/wireguard/
Service Layer
WireGuardService is the central coordinator. Key responsibilities:
Network lifecycle:
CreateMeshNetwork— validates CIDR, creates DB row, initializes in-memoryIPAllocatorDeleteNetwork— cascades DB delete, removes allocator from memoryGetDomainNetwork/ListDomainNetworks— read-through to repository
Peer lifecycle:
AddDeviceToNetwork— publishescmd.plugin.wireguard.jointo the device; the device generates a keypair and reports back viaevt.plugin.wireguard.pubkeyHandlePubkeyReport— idempotent registration: allocates IP on first call, updates endpoint on subsequent calls, then callsDistributeToNetworkHandleEndpointReport— updates endpoint only (peer already registered), redistributesRemovePeer— deletes DB row, releases IP back to allocator
Config building (BuildNetworkConfig):
For a given device in a network, builds a WireGuardNetworkConfig containing:
Address: this device's VPN IP with prefix length (e.g.,100.64.1.5/24)ListenPort: from the network record (default 51820)StunServers: passed through to the device for endpoint discoveryPeers: one entry per other enrolled device, with:AllowedIPs: peer's/32mesh IP, plus egress subnets if the peer is an egress gatewayEndpoint: directIP:portif known, or relay's endpoint if a relay assignment existsKeepalive: 25 seconds when routing via relay or when the target is a relay nodeViaRelay: true when the direct endpoint is not available
Config Distribution
DistributeToDevice and DistributeToNetwork are the two distribution entry points.
DistributeToNetwork(networkID)
└── ListPeers(networkID)
└── for each peer → DistributeToDevice(deviceID)
DistributeToDevice(deviceID)
└── ListNetworksByDevice(deviceID) ← all networks this device belongs to
└── for each network → BuildNetworkConfig(network, deviceID)
└── publish cmd.plugin.wireguard.full_configs
topic: "device/{deviceID}/cmd.plugin.wireguard.full_configs"
payload: { networks: [...] }Full config, always. The server never sends incremental peer diffs. Every config push sends all networks and all peers. This keeps the agent stateless with respect to config — it always has the full picture.
DistributeToNetwork is called automatically after any state change that affects connectivity: peer registration, endpoint update, peer role change, relay assignment/removal.
Relay System
WireGuard requires both peers to know each other's public IP to establish a direct connection. Devices behind strict NAT (symmetric NAT, CGNAT without port mapping) cannot be reached directly.
Peer relay (within a network):
- Designate a device as a relay:
SetPeerRole(isRelay=true). The device must have a reachable public endpoint. - Assign the relay to a NAT-restricted peer:
AssignRelay(networkID, peerDeviceID, relayDeviceID). - On next config distribution, all devices see:
- The relay peer with a
PersistentKeepalive = 25 - The NAT-restricted peer with
ViaRelay = trueand the relay's endpoint
- The relay peer with a
The NAT-restricted device cannot receive inbound connections, but it can establish outbound ones. With PersistentKeepalive set on both sides, the relay peer maintains a persistent tunnel that the restricted device can use.
Platform relay (cross-tenant, super_admin managed):
Platform relays (wg_platform_relays) are infrastructure-level relay servers that can serve multiple tenants. They are registered via CreatePlatformRelay (super_admin only) and assigned to tenant networks. Assignment to networks (AssignPlatformRelayToNetwork) is currently marked CodeUnimplemented and is a planned feature.
Egress Gateway
An egress gateway peer routes traffic destined for external subnets (e.g., on-premises networks) through a designated device.
SetPeerRole(isEgress=true, egressSubnets=["192.168.10.0/24"])on the gateway device.- On next config distribution, all other peers get an
AllowedIPsentry for192.168.10.0/24pointing at the gateway's mesh IP. - The gateway device gets
EnableIPForwarding+ConfigureMasqueradeapplied on its WireGuard interface by the agent plugin.
Bus Message Protocol
All WireGuard plugin messages use the following topic structure:
Commands (server → device)
| Topic | Direction | Payload |
|---|---|---|
device/{deviceID}/cmd.plugin.wireguard.join | server → device | {"network_id": "<uuid>"} |
device/{deviceID}/cmd.plugin.wireguard.full_configs | server → device | See below |
full_configs payload:
{
"networks": [
{
"network_id": "uuid",
"address": "100.64.1.5/24",
"listen_port": 51820,
"stun_servers": ["stun.l.google.com:19302"],
"is_relay": false,
"is_egress": false,
"egress_subnets": [],
"peers": [
{
"public_key": "base64...",
"allowed_ips": ["100.64.1.2/32"],
"endpoint": "203.0.113.10:51820",
"keepalive": 0
},
{
"public_key": "base64...",
"allowed_ips": ["100.64.1.3/32"],
"via_relay": true,
"endpoint": "203.0.113.99:51820",
"keepalive": 25
}
]
}
]
}Events (device → server)
| Topic | Direction | Payload |
|---|---|---|
evt.plugin.wireguard.pubkey | device → server | {"network_id": "uuid", "public_key": "base64...", "endpoint": "1.2.3.4:51820"} |
evt.plugin.wireguard.endpoint | device → server | {"network_id": "uuid", "endpoint": "1.2.3.4:51820"} |
The server subscribes to evt.plugin.wireguard.pubkey in WireGuardServerPlugin.Register() and routes it to HandlePubkeyReport. The endpoint field is optional — it is absent if STUN failed.
ConnectRPC API
All RPCs require JWT auth + X-Tenant-ID header unless noted.
Network Management
| RPC | Auth Required | Description |
|---|---|---|
CreateWireGuardNetwork | tenant_admin+ | Create a standalone mesh network |
GetWireGuardNetwork | JWT | Get network by ID |
ListWireGuardNetworks | JWT | List all networks for tenant |
DeleteWireGuardNetwork | tenant_admin+ | Delete network and all peers |
EnrollGroupInNetwork | tenant_admin+ | Create network + enroll all current group devices atomically |
CreateWireGuardNetwork request:
{
"tenant_id": "uuid",
"name": "production-mesh",
"subnet_cidr": "100.64.1.0/24",
"group_id": "uuid", // optional
"listen_port": 51820 // 0 → default 51820
}EnrollGroupInNetwork is the recommended path for group-wide mesh setup. It:
- Calls
CreateMeshNetworkwith the group ID - Fetches all current group members (up to 1000)
- Calls
AddDeviceToNetworkfor each, which publishes thejoincommand
Devices that join the group after enrollment are handled separately via AddDeviceToNetwork.
Peer Management
| RPC | Auth Required | Description |
|---|---|---|
AddDeviceToNetwork | operator+ | Publish join command to a device |
RemoveDeviceFromNetwork | operator+ | Remove peer, release IP |
ListWireGuardPeers | JWT | List all peers in a network |
SetPeerRole | operator+ | Set relay / egress flags and egress subnets |
AssignRelay | operator+ | Assign a relay peer to a NAT-restricted peer |
RemoveRelayAssignment | operator+ | Remove relay assignment |
SetPeerRole request:
{
"tenant_id": "uuid",
"network_id": "uuid",
"device_id": "uuid",
"is_relay": true,
"is_egress_gateway": false,
"egress_subnets": []
}Changing any role flag triggers immediate DistributeToNetwork.
Platform Relay Management (super_admin only)
| RPC | Description |
|---|---|
CreatePlatformRelay | Register a new infrastructure relay server |
ListPlatformRelays | List all platform relays |
DeletePlatformRelay | Remove a platform relay |
AssignPlatformRelayToNetwork | (not yet implemented) |
RemovePlatformRelayFromNetwork | (not yet implemented) |
Tenant Configuration
The TenantConfig.WireGuardConfig field (TenantWireGuardConfig proto message) exposes per-tenant defaults:
| Field | Type | Default | Description |
|---|---|---|---|
mesh_enabled | bool | false | Whether WireGuard mesh is enabled for this tenant |
ip_range | string | 100.64.0.0/10 | Default CIDR pool to allocate per-group /24 slices from |
default_listen_port | int32 | 51820 | Default UDP listen port for new networks |
default_dns | string | — | Optional DNS server pushed to devices |
default_keepalive | int32 | 0 | PersistentKeepalive for all peers (0 = disabled unless via relay) |
nat_traversal_enabled | bool | true | Whether STUN probing is enabled |
stun_servers | []string | Google/Cloudflare STUN | STUN server list sent with each full_configs push |
These settings are stored in the tenants.settings JSONB column and updated via UpdateTenantConfig. They are not yet automatically applied to new network creation — the operator must pass explicit values in CreateWireGuardNetwork today.
Operational Notes
Known Limitations
IP allocator is not crash-safe. On server restart, the in-memory allocator starts fresh from .1. Existing peers retain their allocated IPs in the DB, but the counter does not know which addresses are already taken. If a new peer joins before the allocator encounters a collision, it will hand out a duplicate. Fix: preload all existing peer IPs per network into the allocator at startup.
full_configs is fire-and-forget. The server publishes to the bus and does not wait for an ACK from the device. If the device is offline, the message is dropped (NATS at-most-once). When the device reconnects and the state machine reaches Syncing, it does not automatically re-request mesh config — the server must re-distribute. This is a gap: device reconnection should trigger DistributeToDevice.
No handshake tracking from agent. The last_handshake_at column exists in the DB but is never written by the current agent. The agent plugin reads handshake times via ShowDump for metrics export only; it does not report them back to the server.
macOS masquerade not implemented. ConfigureMasquerade is a no-op on Darwin and logs a warning. Operators running egress gateways on macOS must configure pf manually.
Full-tunnel routes skipped. 0.0.0.0/0 in AllowedIPs is silently dropped by the agent with a warning. The server does not send it in normal configs, but custom configurations may hit this.
Recommended Deployment Pattern
1. Create a device group (e.g., "production-nodes")
2. Call EnrollGroupInNetwork:
- name: "production-mesh"
- subnet_cidr: "100.64.1.0/24"
- group_id: <production-nodes UUID>
3. Designate one device as a relay (ideally a cloud VM with a static IP):
SetPeerRole(is_relay=true) on that device
4. For any device that fails STUN (symmetric NAT):
AssignRelay(peerDeviceID=<nat device>, relayDeviceID=<relay device>)
5. For on-premises subnet access, designate a gateway device:
SetPeerRole(is_egress_gateway=true, egress_subnets=["192.168.0.0/16"])Adding Devices After Enrollment
AddDeviceToNetwork(network_id, device_id)This publishes cmd.plugin.wireguard.join to the device. The device generates a keypair and publishes evt.plugin.wireguard.pubkey. The server registers the peer, allocates an IP, and redistributes full_configs to all peers in the network.
Monitoring
The agent's CollectMetrics method (called by the telemetry plugin) exports per-peer WireGuard stats:
| Metric | Description |
|---|---|
wg_peer_bytes_rx | Bytes received from this peer |
wg_peer_bytes_tx | Bytes sent to this peer |
wg_peer_last_handshake | Unix timestamp of last successful handshake |
Labels: network_id, peer_pubkey.
These metrics are collected via wgctrl.Client.Device() (no root required on Linux with CAP_NET_ADMIN; the WireGuard kernel module exposes stats via generic netlink).