Skip to content

WireGuard Plugin

The WireGuard plugin provides automated, multi-tenant mesh VPN networking for Umoo-managed devices. Each device group can be enrolled in a dedicated WireGuard network. The platform handles keypair generation, IP allocation, NAT traversal via STUN, and per-device config distribution — with no manual wg tooling required on the server side.


Table of Contents

  1. Architecture Overview
  2. IP Address Allocation
  3. Agent-Side Plugin
  4. Server-Side Plugin
  5. Bus Message Protocol
  6. ConnectRPC API
  7. Tenant Configuration
  8. Operational Notes

Architecture Overview

The plugin is split across two processes that communicate exclusively via the message bus:

┌────────────────────────────────────────────────────────────────────────┐
│  Backend (umoo server)                                              │
│                                                                        │
│  ┌──────────────────────┐     ┌───────────────────────────────────┐   │
│  │  WireGuardService    │     │  AdminService RPCs                │   │
│  │  ─────────────────── │     │  ─────────────────────────────── │   │
│  │  • CreateMeshNetwork │◄────│  CreateWireGuardNetwork           │   │
│  │  • IP allocation     │     │  EnrollGroupInNetwork             │   │
│  │  • Peer registration │     │  AddDeviceToNetwork               │   │
│  │  • Config building   │     │  SetPeerRole / AssignRelay        │   │
│  │  • Distribution      │     │  ListWireGuardPeers               │   │
│  └──────────┬───────────┘     └───────────────────────────────────┘   │
│             │ NATS bus                                                 │
└─────────────┼──────────────────────────────────────────────────────────┘

    ╔═════════╧═════════╗
    ║   Message Bus      ║
    ║  (NATS / local)    ║
    ╚═════════╤═════════╝

┌─────────────┼──────────────────────────────────────────────────────────┐
│  Device Agent (umoo-agent)       │                                  │
│                                     │                                  │
│  ┌──────────────────────────────────▼──────────────────────────────┐   │
│  │  WireGuardPlugin (agent)                                        │   │
│  │  ──────────────────────────────────────────────────────────     │   │
│  │  • Subscribes: cmd.plugin.wireguard.join                        │   │
│  │               cmd.plugin.wireguard.full_configs                 │   │
│  │  • Publishes:  evt.plugin.wireguard.pubkey (key + endpoint)     │   │
│  │               evt.plugin.wireguard.endpoint (STUN result)       │   │
│  │  • Manages:    wg0, wg1, … (one per enrolled network)           │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────────────────────┘

One WireGuard interface per network per device. If a device belongs to three group networks, it runs three independent wgN interfaces, each with its own keys, IP, and peer list.


IP Address Allocation

IPAllocator (internal/backend/plugins/wireguard/ip_allocator.go) manages per-network IP pools entirely in memory, with sequential allocation and explicit release.

How it works

CIDR "100.64.1.0/24"
  ├── .0   reserved (network address, skipped)
  ├── .1   → first Allocate() call
  ├── .2   → second
  │   ...
  ├── .254 → 254th
  └── .255 reserved (broadcast, skipped)
  • Sequential, not randomnextHost counter advances and wraps. This makes peer IPs predictable during development but the counter resets on server restart.
  • Restart caveat: allocators are in-memory only. On restart, PreloadAllocator is called per network from its stored subnet_cidr. Allocated IPs are not replayed from the DB — existing peers keep their allocated_ip column value, but the in-memory counter starts at .1 again. If a new peer joins a network that already has .1 taken, the allocator will hand out .1 again and the UpsertPeer call will conflict. Production fix: PreloadAllocator should pre-mark all existing peer IPs as allocated by querying wg_peers.
  • Minimum subnet: /30 (2 usable hosts). The allocator rejects anything tighter.
  • IPv6: not supported — only IPv4 CIDRs accepted.

Default CIDR

The recommended default is 100.64.0.0/10 (RFC 6598 Shared Address Space). This range is:

  • Not routable on the public internet
  • Not used by home routers (192.168.x) or cloud VPCs (10.x)
  • Not the Docker bridge range (172.17.0.0/16)

Per-group networks should carve out /24 slices from within this range, e.g.:

Group A → 100.64.1.0/24   (254 devices)
Group B → 100.64.2.0/24
...
Group N → 100.64.255.0/24

Agent-Side Plugin

Source: internal/agent/plugins/wireguard/

The agent plugin manages a set of networkState entries (one per enrolled network), each mapped to a WireGuard interface (wg0, wg1, …). Interface names are assigned sequentially from an in-process counter — wg0 for the first network, wg1 for the second, and so on.

Platform Implementation (Go Libraries)

The WGCommandRunner interface abstracts all platform operations. The production implementation (nativeWGRunner) uses Go libraries instead of shelling out to wg/wg-quick:

OperationOld (CLI)New (Go library)
Key generationwg genkey / wg pubkeywgtypes.GeneratePrivateKey()
Peer config / cryptowg set <iface> peer ...wgctrl.Client.ConfigureDevice()
Interface statswg show <iface> dumpwgctrl.Client.Device()
Interface creation (Linux)wg-quick upnetlink.LinkAdd(&GenericLink{LinkType: "wireguard"})
IP assignment (Linux)ip addr add via wg-quicknetlink.AddrAdd()
Route management (Linux)ip route add via wg-quicknetlink.RouteAdd()
Interface teardown (Linux)wg-quick downnetlink.LinkDel() (auto-removes routes)
Interface creation (macOS)wg-quick upwireguard-go <iface> subprocess
IP assignment (macOS)ifconfig via wg-quickexec ifconfig
Route management (macOS)route add via wg-quickexec route -q -n add -inet
IP forwardingsysctlsysctl (unchanged — OS tool, not WireGuard-specific)
NAT masquerade (Linux)iptablesiptables (unchanged)
NAT masquerade (macOS)iptablesstub + warning (use pf manually)

Build tags enforce platform separation:

  • wg_runner_common.go — all platforms (wgctrl calls, key ops, INI parser)
  • wg_runner_linux.go//go:build linux (netlink)
  • wg_runner_darwin.go//go:build darwin (wireguard-go, ifconfig, route)

WireGuard Config Lifecycle

When the server pushes a full_configs command, the agent executes:

1. WriteConfig("/etc/wireguard/wg0.conf", content)
   └── Parses INI content into parsedWGConfig
       Caches it in nativeWGRunner.pending["wg0"]
       (No file written to disk)

2. QuickDown("wg0")   ← if interface was already running
   └── netlink.LinkDel / kill wireguard-go PID

3. QuickUp("wg0")
   └── [Linux]  netlink.LinkAdd (wireguard type)
               netlink.AddrAdd (10.0.0.5/24)
               netlink.LinkSetUp
               wgctrl.ConfigureDevice (private key, listen port, peers)
               netlink.RouteAdd per peer AllowedIP
   └── [macOS]  wireguard-go wg0 (creates utun)
               sleep 100ms
               ifconfig wg0 <ip> <ip> netmask <mask> broadcast <bcast>
               ifconfig wg0 up
               wgctrl.ConfigureDevice
               route add per peer AllowedIP

Full-tunnel routes (0.0.0.0/0) are skipped with a warning. The server does not send these in normal mesh configs — only explicit peer /32 routes and egress subnets are distributed.

STUN NAT Traversal

After applying each network config, the plugin spawns a goroutine to probe STUN servers:

go
go p.probeAndReportSTUN(nc.NetworkID, nc.StunServers)

ProbeSTUN iterates the configured STUN servers (falling back to Google/Cloudflare STUN if none given) and sends an RFC 5389 Binding Request over UDP. On success, it parses the XOR-MAPPED-ADDRESS attribute to get the device's public IP:port.

The result is published as evt.plugin.wireguard.pubkey with the endpoint included. The server then updates the peer's endpoint column and triggers a network-wide config redistribution so all peers get the new direct route to this device.

STUN is opportunistic: if all probes fail, the device enters relay-only mode and the endpoint field stays empty. Devices without endpoints can still communicate if a relay peer is assigned.


Server-Side Plugin

Source: internal/backend/plugins/wireguard/

Service Layer

WireGuardService is the central coordinator. Key responsibilities:

Network lifecycle:

  • CreateMeshNetwork — validates CIDR, creates DB row, initializes in-memory IPAllocator
  • DeleteNetwork — cascades DB delete, removes allocator from memory
  • GetDomainNetwork / ListDomainNetworks — read-through to repository

Peer lifecycle:

  • AddDeviceToNetwork — publishes cmd.plugin.wireguard.join to the device; the device generates a keypair and reports back via evt.plugin.wireguard.pubkey
  • HandlePubkeyReport — idempotent registration: allocates IP on first call, updates endpoint on subsequent calls, then calls DistributeToNetwork
  • HandleEndpointReport — updates endpoint only (peer already registered), redistributes
  • RemovePeer — deletes DB row, releases IP back to allocator

Config building (BuildNetworkConfig):

For a given device in a network, builds a WireGuardNetworkConfig containing:

  • Address: this device's VPN IP with prefix length (e.g., 100.64.1.5/24)
  • ListenPort: from the network record (default 51820)
  • StunServers: passed through to the device for endpoint discovery
  • Peers: one entry per other enrolled device, with:
    • AllowedIPs: peer's /32 mesh IP, plus egress subnets if the peer is an egress gateway
    • Endpoint: direct IP:port if known, or relay's endpoint if a relay assignment exists
    • Keepalive: 25 seconds when routing via relay or when the target is a relay node
    • ViaRelay: true when the direct endpoint is not available

Config Distribution

DistributeToDevice and DistributeToNetwork are the two distribution entry points.

DistributeToNetwork(networkID)
  └── ListPeers(networkID)
  └── for each peer → DistributeToDevice(deviceID)

DistributeToDevice(deviceID)
  └── ListNetworksByDevice(deviceID)     ← all networks this device belongs to
  └── for each network → BuildNetworkConfig(network, deviceID)
  └── publish cmd.plugin.wireguard.full_configs
      topic: "device/{deviceID}/cmd.plugin.wireguard.full_configs"
      payload: { networks: [...] }

Full config, always. The server never sends incremental peer diffs. Every config push sends all networks and all peers. This keeps the agent stateless with respect to config — it always has the full picture.

DistributeToNetwork is called automatically after any state change that affects connectivity: peer registration, endpoint update, peer role change, relay assignment/removal.

Relay System

WireGuard requires both peers to know each other's public IP to establish a direct connection. Devices behind strict NAT (symmetric NAT, CGNAT without port mapping) cannot be reached directly.

Peer relay (within a network):

  1. Designate a device as a relay: SetPeerRole(isRelay=true). The device must have a reachable public endpoint.
  2. Assign the relay to a NAT-restricted peer: AssignRelay(networkID, peerDeviceID, relayDeviceID).
  3. On next config distribution, all devices see:
    • The relay peer with a PersistentKeepalive = 25
    • The NAT-restricted peer with ViaRelay = true and the relay's endpoint

The NAT-restricted device cannot receive inbound connections, but it can establish outbound ones. With PersistentKeepalive set on both sides, the relay peer maintains a persistent tunnel that the restricted device can use.

Platform relay (cross-tenant, super_admin managed):

Platform relays (wg_platform_relays) are infrastructure-level relay servers that can serve multiple tenants. They are registered via CreatePlatformRelay (super_admin only) and assigned to tenant networks. Assignment to networks (AssignPlatformRelayToNetwork) is currently marked CodeUnimplemented and is a planned feature.

Egress Gateway

An egress gateway peer routes traffic destined for external subnets (e.g., on-premises networks) through a designated device.

  1. SetPeerRole(isEgress=true, egressSubnets=["192.168.10.0/24"]) on the gateway device.
  2. On next config distribution, all other peers get an AllowedIPs entry for 192.168.10.0/24 pointing at the gateway's mesh IP.
  3. The gateway device gets EnableIPForwarding + ConfigureMasquerade applied on its WireGuard interface by the agent plugin.

Bus Message Protocol

All WireGuard plugin messages use the following topic structure:

Commands (server → device)

TopicDirectionPayload
device/{deviceID}/cmd.plugin.wireguard.joinserver → device{"network_id": "<uuid>"}
device/{deviceID}/cmd.plugin.wireguard.full_configsserver → deviceSee below

full_configs payload:

json
{
  "networks": [
    {
      "network_id":  "uuid",
      "address":     "100.64.1.5/24",
      "listen_port": 51820,
      "stun_servers": ["stun.l.google.com:19302"],
      "is_relay":    false,
      "is_egress":   false,
      "egress_subnets": [],
      "peers": [
        {
          "public_key":  "base64...",
          "allowed_ips": ["100.64.1.2/32"],
          "endpoint":    "203.0.113.10:51820",
          "keepalive":   0
        },
        {
          "public_key":  "base64...",
          "allowed_ips": ["100.64.1.3/32"],
          "via_relay":   true,
          "endpoint":    "203.0.113.99:51820",
          "keepalive":   25
        }
      ]
    }
  ]
}

Events (device → server)

TopicDirectionPayload
evt.plugin.wireguard.pubkeydevice → server{"network_id": "uuid", "public_key": "base64...", "endpoint": "1.2.3.4:51820"}
evt.plugin.wireguard.endpointdevice → server{"network_id": "uuid", "endpoint": "1.2.3.4:51820"}

The server subscribes to evt.plugin.wireguard.pubkey in WireGuardServerPlugin.Register() and routes it to HandlePubkeyReport. The endpoint field is optional — it is absent if STUN failed.


ConnectRPC API

All RPCs require JWT auth + X-Tenant-ID header unless noted.

Network Management

RPCAuth RequiredDescription
CreateWireGuardNetworktenant_admin+Create a standalone mesh network
GetWireGuardNetworkJWTGet network by ID
ListWireGuardNetworksJWTList all networks for tenant
DeleteWireGuardNetworktenant_admin+Delete network and all peers
EnrollGroupInNetworktenant_admin+Create network + enroll all current group devices atomically

CreateWireGuardNetwork request:

json
{
  "tenant_id":   "uuid",
  "name":        "production-mesh",
  "subnet_cidr": "100.64.1.0/24",
  "group_id":    "uuid",       // optional
  "listen_port": 51820         // 0 → default 51820
}

EnrollGroupInNetwork is the recommended path for group-wide mesh setup. It:

  1. Calls CreateMeshNetwork with the group ID
  2. Fetches all current group members (up to 1000)
  3. Calls AddDeviceToNetwork for each, which publishes the join command

Devices that join the group after enrollment are handled separately via AddDeviceToNetwork.

Peer Management

RPCAuth RequiredDescription
AddDeviceToNetworkoperator+Publish join command to a device
RemoveDeviceFromNetworkoperator+Remove peer, release IP
ListWireGuardPeersJWTList all peers in a network
SetPeerRoleoperator+Set relay / egress flags and egress subnets
AssignRelayoperator+Assign a relay peer to a NAT-restricted peer
RemoveRelayAssignmentoperator+Remove relay assignment

SetPeerRole request:

json
{
  "tenant_id":        "uuid",
  "network_id":       "uuid",
  "device_id":        "uuid",
  "is_relay":         true,
  "is_egress_gateway": false,
  "egress_subnets":   []
}

Changing any role flag triggers immediate DistributeToNetwork.

Platform Relay Management (super_admin only)

RPCDescription
CreatePlatformRelayRegister a new infrastructure relay server
ListPlatformRelaysList all platform relays
DeletePlatformRelayRemove a platform relay
AssignPlatformRelayToNetwork(not yet implemented)
RemovePlatformRelayFromNetwork(not yet implemented)

Tenant Configuration

The TenantConfig.WireGuardConfig field (TenantWireGuardConfig proto message) exposes per-tenant defaults:

FieldTypeDefaultDescription
mesh_enabledboolfalseWhether WireGuard mesh is enabled for this tenant
ip_rangestring100.64.0.0/10Default CIDR pool to allocate per-group /24 slices from
default_listen_portint3251820Default UDP listen port for new networks
default_dnsstringOptional DNS server pushed to devices
default_keepaliveint320PersistentKeepalive for all peers (0 = disabled unless via relay)
nat_traversal_enabledbooltrueWhether STUN probing is enabled
stun_servers[]stringGoogle/Cloudflare STUNSTUN server list sent with each full_configs push

These settings are stored in the tenants.settings JSONB column and updated via UpdateTenantConfig. They are not yet automatically applied to new network creation — the operator must pass explicit values in CreateWireGuardNetwork today.


Operational Notes

Known Limitations

IP allocator is not crash-safe. On server restart, the in-memory allocator starts fresh from .1. Existing peers retain their allocated IPs in the DB, but the counter does not know which addresses are already taken. If a new peer joins before the allocator encounters a collision, it will hand out a duplicate. Fix: preload all existing peer IPs per network into the allocator at startup.

full_configs is fire-and-forget. The server publishes to the bus and does not wait for an ACK from the device. If the device is offline, the message is dropped (NATS at-most-once). When the device reconnects and the state machine reaches Syncing, it does not automatically re-request mesh config — the server must re-distribute. This is a gap: device reconnection should trigger DistributeToDevice.

No handshake tracking from agent. The last_handshake_at column exists in the DB but is never written by the current agent. The agent plugin reads handshake times via ShowDump for metrics export only; it does not report them back to the server.

macOS masquerade not implemented. ConfigureMasquerade is a no-op on Darwin and logs a warning. Operators running egress gateways on macOS must configure pf manually.

Full-tunnel routes skipped. 0.0.0.0/0 in AllowedIPs is silently dropped by the agent with a warning. The server does not send it in normal configs, but custom configurations may hit this.

1. Create a device group (e.g., "production-nodes")
2. Call EnrollGroupInNetwork:
   - name: "production-mesh"
   - subnet_cidr: "100.64.1.0/24"
   - group_id: <production-nodes UUID>
3. Designate one device as a relay (ideally a cloud VM with a static IP):
   SetPeerRole(is_relay=true) on that device
4. For any device that fails STUN (symmetric NAT):
   AssignRelay(peerDeviceID=<nat device>, relayDeviceID=<relay device>)
5. For on-premises subnet access, designate a gateway device:
   SetPeerRole(is_egress_gateway=true, egress_subnets=["192.168.0.0/16"])

Adding Devices After Enrollment

AddDeviceToNetwork(network_id, device_id)

This publishes cmd.plugin.wireguard.join to the device. The device generates a keypair and publishes evt.plugin.wireguard.pubkey. The server registers the peer, allocates an IP, and redistributes full_configs to all peers in the network.

Monitoring

The agent's CollectMetrics method (called by the telemetry plugin) exports per-peer WireGuard stats:

MetricDescription
wg_peer_bytes_rxBytes received from this peer
wg_peer_bytes_txBytes sent to this peer
wg_peer_last_handshakeUnix timestamp of last successful handshake

Labels: network_id, peer_pubkey.

These metrics are collected via wgctrl.Client.Device() (no root required on Linux with CAP_NET_ADMIN; the WireGuard kernel module exposes stats via generic netlink).

Umoo — IoT Device Management Platform