WireGuard Plugin

The WireGuard plugin provides automated, multi-tenant mesh VPN networking for Umoo-managed devices. Each device group can be enrolled in a dedicated WireGuard network. The platform handles keypair generation, IP allocation, NAT traversal via STUN, and per-device config distribution — with no manual wg tooling required on the server side.

Architecture Overview
IP Address Allocation
Agent-Side Plugin
Server-Side Plugin
Bus Message Protocol
ConnectRPC API
Tenant Configuration
Operational Notes

Architecture Overview

The plugin is split across two processes that communicate exclusively via the message bus:

┌────────────────────────────────────────────────────────────────────────┐
│  Backend (umoo server)                                              │
│                                                                        │
│  ┌──────────────────────┐     ┌───────────────────────────────────┐   │
│  │  WireGuardService    │     │  AdminService RPCs                │   │
│  │  ─────────────────── │     │  ─────────────────────────────── │   │
│  │  • CreateMeshNetwork │◄────│  CreateWireGuardNetwork           │   │
│  │  • IP allocation     │     │  EnrollGroupInNetwork             │   │
│  │  • Peer registration │     │  AddDeviceToNetwork               │   │
│  │  • Config building   │     │  SetPeerRole / AssignRelay        │   │
│  │  • Distribution      │     │  ListWireGuardPeers               │   │
│  └──────────┬───────────┘     └───────────────────────────────────┘   │
│             │ NATS bus                                                 │
└─────────────┼──────────────────────────────────────────────────────────┘
              │
    ╔═════════╧═════════╗
    ║   Message Bus      ║
    ║  (NATS / local)    ║
    ╚═════════╤═════════╝
              │
┌─────────────┼──────────────────────────────────────────────────────────┐
│  Device Agent (umoo-agent)       │                                  │
│                                     │                                  │
│  ┌──────────────────────────────────▼──────────────────────────────┐   │
│  │  WireGuardPlugin (agent)                                        │   │
│  │  ──────────────────────────────────────────────────────────     │   │
│  │  • Subscribes: cmd.plugin.wireguard.join                        │   │
│  │               cmd.plugin.wireguard.full_configs                 │   │
│  │  • Publishes:  evt.plugin.wireguard.pubkey (key + endpoint)     │   │
│  │               evt.plugin.wireguard.endpoint (STUN result)       │   │
│  │  • Manages:    wg0, wg1, … (one per enrolled network)           │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────────────────────┘

One WireGuard interface per network per device. If a device belongs to three group networks, it runs three independent wgN interfaces, each with its own keys, IP, and peer list.

IP Address Allocation

IPAllocator (internal/backend/plugins/wireguard/ip_allocator.go) manages per-network IP pools entirely in memory, with sequential allocation and explicit release.

How it works

CIDR "100.64.1.0/24"
  ├── .0   reserved (network address, skipped)
  ├── .1   → first Allocate() call
  ├── .2   → second
  │   ...
  ├── .254 → 254th
  └── .255 reserved (broadcast, skipped)

Sequential, not random — nextHost counter advances and wraps. This makes peer IPs predictable during development but the counter resets on server restart.
Restart caveat: allocators are in-memory only. On restart, PreloadAllocator is called per network from its stored subnet_cidr. Allocated IPs are not replayed from the DB — existing peers keep their allocated_ip column value, but the in-memory counter starts at .1 again. If a new peer joins a network that already has .1 taken, the allocator will hand out .1 again and the UpsertPeer call will conflict. Production fix: PreloadAllocator should pre-mark all existing peer IPs as allocated by querying wg_peers.
Minimum subnet: /30 (2 usable hosts). The allocator rejects anything tighter.
IPv6: not supported — only IPv4 CIDRs accepted.

Default CIDR

The recommended default is 100.64.0.0/10 (RFC 6598 Shared Address Space). This range is:

Not routable on the public internet
Not used by home routers (192.168.x) or cloud VPCs (10.x)
Not the Docker bridge range (172.17.0.0/16)

Per-group networks should carve out /24 slices from within this range, e.g.:

Group A → 100.64.1.0/24   (254 devices)
Group B → 100.64.2.0/24
...
Group N → 100.64.255.0/24

Agent-Side Plugin

Source: internal/agent/plugins/wireguard/

The agent plugin manages a set of networkState entries (one per enrolled network), each mapped to a WireGuard interface (wg0, wg1, …). Interface names are assigned sequentially from an in-process counter — wg0 for the first network, wg1 for the second, and so on.

Platform Implementation (Go Libraries)

The WGCommandRunner interface abstracts all platform operations. The production implementation (nativeWGRunner) uses Go libraries instead of shelling out to wg/wg-quick:

Operation	Old (CLI)	New (Go library)
Key generation	`wg genkey` / `wg pubkey`	`wgtypes.GeneratePrivateKey()`
Peer config / crypto	`wg set <iface> peer ...`	`wgctrl.Client.ConfigureDevice()`
Interface stats	`wg show <iface> dump`	`wgctrl.Client.Device()`
Interface creation (Linux)	`wg-quick up`	`netlink.LinkAdd(&GenericLink{LinkType: "wireguard"})`
IP assignment (Linux)	`ip addr add` via wg-quick	`netlink.AddrAdd()`
Route management (Linux)	`ip route add` via wg-quick	`netlink.RouteAdd()`
Interface teardown (Linux)	`wg-quick down`	`netlink.LinkDel()` (auto-removes routes)
Interface creation (macOS)	`wg-quick up`	`wireguard-go <iface>` subprocess
IP assignment (macOS)	`ifconfig` via wg-quick	`exec ifconfig`
Route management (macOS)	`route add` via wg-quick	`exec route -q -n add -inet`
IP forwarding	`sysctl`	`sysctl` (unchanged — OS tool, not WireGuard-specific)
NAT masquerade (Linux)	`iptables`	`iptables` (unchanged)
NAT masquerade (macOS)	`iptables`	stub + warning (use pf manually)

Build tags enforce platform separation:

wg_runner_common.go — all platforms (wgctrl calls, key ops, INI parser)
wg_runner_linux.go — //go:build linux (netlink)
wg_runner_darwin.go — //go:build darwin (wireguard-go, ifconfig, route)

WireGuard Config Lifecycle

When the server pushes a full_configs command, the agent executes:

1. WriteConfig("/etc/wireguard/wg0.conf", content)
   └── Parses INI content into parsedWGConfig
       Caches it in nativeWGRunner.pending["wg0"]
       (No file written to disk)

2. QuickDown("wg0")   ← if interface was already running
   └── netlink.LinkDel / kill wireguard-go PID

3. QuickUp("wg0")
   └── [Linux]  netlink.LinkAdd (wireguard type)
               netlink.AddrAdd (10.0.0.5/24)
               netlink.LinkSetUp
               wgctrl.ConfigureDevice (private key, listen port, peers)
               netlink.RouteAdd per peer AllowedIP
   └── [macOS]  wireguard-go wg0 (creates utun)
               sleep 100ms
               ifconfig wg0 <ip> <ip> netmask <mask> broadcast <bcast>
               ifconfig wg0 up
               wgctrl.ConfigureDevice
               route add per peer AllowedIP

Full-tunnel routes (0.0.0.0/0) are skipped with a warning. The server does not send these in normal mesh configs — only explicit peer /32 routes and egress subnets are distributed.

STUN NAT Traversal

After applying each network config, the plugin spawns a goroutine to probe STUN servers:

go p.probeAndReportSTUN(nc.NetworkID, nc.StunServers)

ProbeSTUN iterates the configured STUN servers (falling back to Google/Cloudflare STUN if none given) and sends an RFC 5389 Binding Request over UDP. On success, it parses the XOR-MAPPED-ADDRESS attribute to get the device's public IP:port.

The result is published as evt.plugin.wireguard.pubkey with the endpoint included. The server then updates the peer's endpoint column and triggers a network-wide config redistribution so all peers get the new direct route to this device.

STUN is opportunistic: if all probes fail, the device enters relay-only mode and the endpoint field stays empty. Devices without endpoints can still communicate if a relay peer is assigned.

Server-Side Plugin

Source: internal/backend/plugins/wireguard/

Service Layer

WireGuardService is the central coordinator. Key responsibilities:

Network lifecycle:

CreateMeshNetwork — validates CIDR, creates DB row, initializes in-memory IPAllocator
DeleteNetwork — cascades DB delete, removes allocator from memory
GetDomainNetwork / ListDomainNetworks — read-through to repository

Peer lifecycle:

AddDeviceToNetwork — publishes cmd.plugin.wireguard.join to the device; the device generates a keypair and reports back via evt.plugin.wireguard.pubkey
HandlePubkeyReport — idempotent registration: allocates IP on first call, updates endpoint on subsequent calls, then calls DistributeToNetwork
HandleEndpointReport — updates endpoint only (peer already registered), redistributes
RemovePeer — deletes DB row, releases IP back to allocator

Config building (BuildNetworkConfig):

For a given device in a network, builds a WireGuardNetworkConfig containing:

Address: this device's VPN IP with prefix length (e.g., 100.64.1.5/24)
ListenPort: from the network record (default 51820)
StunServers: passed through to the device for endpoint discovery
Peers: one entry per other enrolled device, with:
- AllowedIPs: peer's /32 mesh IP, plus egress subnets if the peer is an egress gateway
- Endpoint: direct IP:port if known, or relay's endpoint if a relay assignment exists
- Keepalive: 25 seconds when routing via relay or when the target is a relay node
- ViaRelay: true when the direct endpoint is not available

Config Distribution

DistributeToDevice and DistributeToNetwork are the two distribution entry points.

DistributeToNetwork(networkID)
  └── ListPeers(networkID)
  └── for each peer → DistributeToDevice(deviceID)

DistributeToDevice(deviceID)
  └── ListNetworksByDevice(deviceID)     ← all networks this device belongs to
  └── for each network → BuildNetworkConfig(network, deviceID)
  └── publish cmd.plugin.wireguard.full_configs
      topic: "device/{deviceID}/cmd.plugin.wireguard.full_configs"
      payload: { networks: [...] }

Full config, always. The server never sends incremental peer diffs. Every config push sends all networks and all peers. This keeps the agent stateless with respect to config — it always has the full picture.

DistributeToNetwork is called automatically after any state change that affects connectivity: peer registration, endpoint update, peer role change, relay assignment/removal.

Relay System

WireGuard requires both peers to know each other's public IP to establish a direct connection. Devices behind strict NAT (symmetric NAT, CGNAT without port mapping) cannot be reached directly.

Peer relay (within a network):

Designate a device as a relay: SetPeerRole(isRelay=true). The device must have a reachable public endpoint.
Assign the relay to a NAT-restricted peer: AssignRelay(networkID, peerDeviceID, relayDeviceID).
On next config distribution, all devices see:
- The relay peer with a PersistentKeepalive = 25
- The NAT-restricted peer with ViaRelay = true and the relay's endpoint

The NAT-restricted device cannot receive inbound connections, but it can establish outbound ones. With PersistentKeepalive set on both sides, the relay peer maintains a persistent tunnel that the restricted device can use.

Platform relay (cross-tenant, super_admin managed):

Platform relays (wg_platform_relays) are infrastructure-level relay servers that can serve multiple tenants. They are registered via CreatePlatformRelay (super_admin only) and assigned to tenant networks. Assignment to networks (AssignPlatformRelayToNetwork) is currently marked CodeUnimplemented and is a planned feature.

Egress Gateway

An egress gateway peer routes traffic destined for external subnets (e.g., on-premises networks) through a designated device.

SetPeerRole(isEgress=true, egressSubnets=["192.168.10.0/24"]) on the gateway device.
On next config distribution, all other peers get an AllowedIPs entry for 192.168.10.0/24 pointing at the gateway's mesh IP.
The gateway device gets EnableIPForwarding + ConfigureMasquerade applied on its WireGuard interface by the agent plugin.

Bus Message Protocol

All WireGuard plugin messages use the following topic structure:

Commands (server → device)

Topic	Direction	Payload
`device/{deviceID}/cmd.plugin.wireguard.join`	server → device	`{"network_id": "<uuid>"}`
`device/{deviceID}/cmd.plugin.wireguard.full_configs`	server → device	See below

full_configs payload:

json

{
  "networks": [
    {
      "network_id":  "uuid",
      "address":     "100.64.1.5/24",
      "listen_port": 51820,
      "stun_servers": ["stun.l.google.com:19302"],
      "is_relay":    false,
      "is_egress":   false,
      "egress_subnets": [],
      "peers": [
        {
          "public_key":  "base64...",
          "allowed_ips": ["100.64.1.2/32"],
          "endpoint":    "203.0.113.10:51820",
          "keepalive":   0
        },
        {
          "public_key":  "base64...",
          "allowed_ips": ["100.64.1.3/32"],
          "via_relay":   true,
          "endpoint":    "203.0.113.99:51820",
          "keepalive":   25
        }
      ]
    }
  ]
}

Events (device → server)

Topic	Direction	Payload
`evt.plugin.wireguard.pubkey`	device → server	`{"network_id": "uuid", "public_key": "base64...", "endpoint": "1.2.3.4:51820"}`
`evt.plugin.wireguard.endpoint`	device → server	`{"network_id": "uuid", "endpoint": "1.2.3.4:51820"}`

The server subscribes to evt.plugin.wireguard.pubkey in WireGuardServerPlugin.Register() and routes it to HandlePubkeyReport. The endpoint field is optional — it is absent if STUN failed.

ConnectRPC API

All RPCs require JWT auth + X-Tenant-ID header unless noted.

Network Management

RPC	Auth Required	Description
`CreateWireGuardNetwork`	tenant_admin+	Create a standalone mesh network
`GetWireGuardNetwork`	JWT	Get network by ID
`ListWireGuardNetworks`	JWT	List all networks for tenant
`DeleteWireGuardNetwork`	tenant_admin+	Delete network and all peers
`EnrollGroupInNetwork`	tenant_admin+	Create network + enroll all current group devices atomically

CreateWireGuardNetwork request:

json

{
  "tenant_id":   "uuid",
  "name":        "production-mesh",
  "subnet_cidr": "100.64.1.0/24",
  "group_id":    "uuid",       // optional
  "listen_port": 51820         // 0 → default 51820
}

EnrollGroupInNetwork is the recommended path for group-wide mesh setup. It:

Calls CreateMeshNetwork with the group ID
Fetches all current group members (up to 1000)
Calls AddDeviceToNetwork for each, which publishes the join command

Devices that join the group after enrollment are handled separately via AddDeviceToNetwork.

Peer Management

RPC	Auth Required	Description
`AddDeviceToNetwork`	operator+	Publish join command to a device
`RemoveDeviceFromNetwork`	operator+	Remove peer, release IP
`ListWireGuardPeers`	JWT	List all peers in a network
`SetPeerRole`	operator+	Set relay / egress flags and egress subnets
`AssignRelay`	operator+	Assign a relay peer to a NAT-restricted peer
`RemoveRelayAssignment`	operator+	Remove relay assignment

SetPeerRole request:

json

{
  "tenant_id":        "uuid",
  "network_id":       "uuid",
  "device_id":        "uuid",
  "is_relay":         true,
  "is_egress_gateway": false,
  "egress_subnets":   []
}

Changing any role flag triggers immediate DistributeToNetwork.

Platform Relay Management (super_admin only)

RPC	Description
`CreatePlatformRelay`	Register a new infrastructure relay server
`ListPlatformRelays`	List all platform relays
`DeletePlatformRelay`	Remove a platform relay
`AssignPlatformRelayToNetwork`	(not yet implemented)
`RemovePlatformRelayFromNetwork`	(not yet implemented)

Tenant Configuration

The TenantConfig.WireGuardConfig field (TenantWireGuardConfig proto message) exposes per-tenant defaults:

Field	Type	Default	Description
`mesh_enabled`	bool	false	Whether WireGuard mesh is enabled for this tenant
`ip_range`	string	`100.64.0.0/10`	Default CIDR pool to allocate per-group /24 slices from
`default_listen_port`	int32	51820	Default UDP listen port for new networks
`default_dns`	string	—	Optional DNS server pushed to devices
`default_keepalive`	int32	0	PersistentKeepalive for all peers (0 = disabled unless via relay)
`nat_traversal_enabled`	bool	true	Whether STUN probing is enabled
`stun_servers`	[]string	Google/Cloudflare STUN	STUN server list sent with each `full_configs` push

These settings are stored in the tenants.settings JSONB column and updated via UpdateTenantConfig. They are not yet automatically applied to new network creation — the operator must pass explicit values in CreateWireGuardNetwork today.

Operational Notes

Known Limitations

IP allocator is not crash-safe. On server restart, the in-memory allocator starts fresh from .1. Existing peers retain their allocated IPs in the DB, but the counter does not know which addresses are already taken. If a new peer joins before the allocator encounters a collision, it will hand out a duplicate. Fix: preload all existing peer IPs per network into the allocator at startup.

full_configs is fire-and-forget. The server publishes to the bus and does not wait for an ACK from the device. If the device is offline, the message is dropped (NATS at-most-once). When the device reconnects and the state machine reaches Syncing, it does not automatically re-request mesh config — the server must re-distribute. This is a gap: device reconnection should trigger DistributeToDevice.

No handshake tracking from agent. The last_handshake_at column exists in the DB but is never written by the current agent. The agent plugin reads handshake times via ShowDump for metrics export only; it does not report them back to the server.

macOS masquerade not implemented. ConfigureMasquerade is a no-op on Darwin and logs a warning. Operators running egress gateways on macOS must configure pf manually.

Full-tunnel routes skipped. 0.0.0.0/0 in AllowedIPs is silently dropped by the agent with a warning. The server does not send it in normal configs, but custom configurations may hit this.

Recommended Deployment Pattern

1. Create a device group (e.g., "production-nodes")
2. Call EnrollGroupInNetwork:
   - name: "production-mesh"
   - subnet_cidr: "100.64.1.0/24"
   - group_id: <production-nodes UUID>
3. Designate one device as a relay (ideally a cloud VM with a static IP):
   SetPeerRole(is_relay=true) on that device
4. For any device that fails STUN (symmetric NAT):
   AssignRelay(peerDeviceID=<nat device>, relayDeviceID=<relay device>)
5. For on-premises subnet access, designate a gateway device:
   SetPeerRole(is_egress_gateway=true, egress_subnets=["192.168.0.0/16"])

Adding Devices After Enrollment

AddDeviceToNetwork(network_id, device_id)

This publishes cmd.plugin.wireguard.join to the device. The device generates a keypair and publishes evt.plugin.wireguard.pubkey. The server registers the peer, allocates an IP, and redistributes full_configs to all peers in the network.

Monitoring

The agent's CollectMetrics method (called by the telemetry plugin) exports per-peer WireGuard stats:

Metric	Description
`wg_peer_bytes_rx`	Bytes received from this peer
`wg_peer_bytes_tx`	Bytes sent to this peer
`wg_peer_last_handshake`	Unix timestamp of last successful handshake

Labels: network_id, peer_pubkey.

These metrics are collected via wgctrl.Client.Device() (no root required on Linux with CAP_NET_ADMIN; the WireGuard kernel module exposes stats via generic netlink).

WireGuard Plugin ​

Table of Contents ​

Architecture Overview ​

IP Address Allocation ​

How it works ​

Default CIDR ​

Agent-Side Plugin ​

Platform Implementation (Go Libraries) ​

WireGuard Config Lifecycle ​

STUN NAT Traversal ​

Server-Side Plugin ​

Service Layer ​

Config Distribution ​

Relay System ​

Egress Gateway ​

Bus Message Protocol ​

Commands (server → device) ​

Events (device → server) ​

ConnectRPC API ​

Network Management ​

Peer Management ​

Platform Relay Management (super_admin only) ​

Tenant Configuration ​

Operational Notes ​

Known Limitations ​

Recommended Deployment Pattern ​

Adding Devices After Enrollment ​

Monitoring ​

WireGuard Plugin

Table of Contents

Architecture Overview

IP Address Allocation

How it works

Default CIDR

Agent-Side Plugin

Platform Implementation (Go Libraries)

WireGuard Config Lifecycle

STUN NAT Traversal

Server-Side Plugin

Service Layer

Config Distribution

Relay System

Egress Gateway

Bus Message Protocol

Commands (server → device)

Events (device → server)

ConnectRPC API

Network Management

Peer Management

Platform Relay Management (super_admin only)

Tenant Configuration

Operational Notes

Known Limitations

Recommended Deployment Pattern

Adding Devices After Enrollment

Monitoring