Why your uptime monitor says your WireGuard server is up (when it's actually broken)

This is the single most common failure mode I see when teams set up VPN monitoring for the first time. Their uptime dashboard shows a healthy server. Users are in Slack reporting broken tunnels. The monitor keeps saying green. This post is about why that happens, and what to do about it.

If you run WireGuard in production and your monitoring setup ends at a TCP or UDP port check, you have this bug. You just haven't been bitten by it yet.

The short version

WireGuard is a stateless, authenticated UDP protocol. The server accepts packets on a single UDP port, 51820 by default, and responds only to packets that present a valid Curve25519 handshake. If a packet doesn't decrypt correctly, the server drops it silently. No TCP reset, no ICMP unreachable, no log line (unless you've turned on debug logging).

That means a UDP port probe, the kind almost every generic uptime monitor does, is useless as a health check. The WireGuard kernel module will receive your probe, attempt to decrypt it as a handshake, fail, and drop it on the floor. From the outside, it looks like the port is open. From the inside, nothing useful happened.

Meanwhile, a real WireGuard client might be getting the exact same silent treatment. A key rotation that wasn't propagated, an expired preshared key, a routing misconfig, a DDoS rate-limiter kicking in, or twenty other things. The port is still "open". Your monitor still says green.

What a WireGuard handshake actually involves

WireGuard's handshake is a variant of the Noise protocol framework. The simplified flow for a client connecting to a server:

  1. Client sends handshake_initiation (148 bytes) containing its ephemeral public key, its static public key (encrypted with the server's known static public key), and a MAC.
  2. Server verifies the MAC, decrypts the static key, looks it up in its list of configured peers. If found, continues. If not, silently drops.
  3. Server sends handshake_response (92 bytes) containing its own ephemeral public key plus confirmation material.
  4. Client verifies. Now both sides have derived session keys.
  5. Data packets flow, both sides rotating keys every 120 seconds or 260 messages.

The critical observation: the server's response in step 3 only happens if the client presented a valid initiation in step 1. A random UDP probe does not. The server treats it as noise and drops it. No response goes back to the prober.

So what does a port monitor actually observe when it sends a UDP probe to a WireGuard server?

Usually, nothing. No response. Most UDP port monitors interpret "no response" as "port is open" (UDP is connectionless, so there's no positive "accepted" signal like SYN-ACK in TCP). The server could be fully broken; the monitor still sees green.

A small number of monitors do ICMP-unreachable detection: if the server's firewall is configured to send ICMP port unreachable for closed UDP ports, a port monitor can distinguish open from closed. But most WireGuard deployments don't send ICMP unreachable, they just drop, because doing so leaks information to port scanners. So this signal is usually unavailable too.

Failure modes port monitors miss

Here's a non-exhaustive list of ways your WireGuard server can be totally broken while a port monitor reports green:

1. Key rotation not propagated

You rotated the server's static private key (via wg genkey), pushed new PublicKey values to all clients via config management. One fleet, say the mobile clients still pinned to the old config, has the old public key. Those clients can no longer handshake. Server sees their packets as invalid, drops them. Port is still open.

2. Preshared key expired or mismatched

If you use PresharedKey (recommended for post-quantum resistance), and it's out of sync between the server's peer config and the client's, handshakes fail. Silent drop. Port still open.

3. AllowedIPs misconfigured

The server accepts handshakes but routing is broken. AllowedIPs on the server's peer config doesn't include the client's tunnel IP. Handshake succeeds; data packets from the client get dropped by the WireGuard cryptokey routing layer. From the client's perspective, the tunnel is "up" but nothing works. From outside, port monitor is green.

4. Kernel module loaded but process state broken

systemd says wg-quick@wg0 is active. ip link show wg0 shows the interface. But the last peer was removed via wg set wg0 peer <key> remove and the config on disk got out of sync. Server will accept new peers if configured but existing handshakes from those removed peers get silently dropped. Still green.

5. DDoS rate-limiter burning legitimate handshakes

WireGuard has built-in cookie-based DoS protection. Under sustained handshake floods, the server starts replying with cookie challenges instead of handshake responses. Clients that don't handle cookies correctly (some older userspace implementations) fail. The server is "up" in every technical sense; specific clients just can't connect. Port monitor green.

6. The interface is up but the process is pinned on an old CPU core with a bug

Rare but seen in the wild: wireguard-go userspace implementation, one core hanging on a spin lock, interface appears active, no packets processed. Takes a restart to fix. Port monitor still green (the socket is still bound).

How to actually test a WireGuard server

The honest answer is: you perform the real handshake. There's no shortcut. Here are three ways to do it, in increasing order of convenience.

Option 1: use wg itself

If you have a client config for the server, stand up a peer, try a handshake, check the result:

# on a check host with a monitoring-only peer key
sudo wg-quick up wg-check
# wait a couple of seconds, then check
sudo wg show wg-check latest-handshakes
# expected: unix timestamp of recent handshake
# failure:  "0" (epoch, never handshook)

You can script this. Handshake happens on first data packet, so a quick ping 10.0.0.1 across the tunnel forces it. If latest-handshakes updates, the server is alive. If it sits at 0 for more than 10 seconds, something's broken.

Downsides: you need a real peer provisioned just for monitoring, you need wg-quick on the check host (usually root), and you can't easily do this from lots of regions.

Option 2: roll your own handshake initiation

The WireGuard handshake is well-documented and small. You can build an initiation packet in Python:

# pip install pynacl
import nacl.bindings as nb
import os, struct, time, socket

MSG_INITIATION = 1

def build_handshake_initiation(server_pubkey: bytes,
                               client_privkey: bytes,
                               client_pubkey: bytes,
                               sender_index: int) -> bytes:
    # Simplified. Real implementation needs Noise_IK state machine,
    # TAI64N timestamp, MAC1 + MAC2 computation over the right byte ranges.
    # See github.com/WireGuard/wireguard-go/device/noise-protocol.go for
    # a reference you can port.
    ...

# then:
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(2.0)
sock.sendto(build_handshake_initiation(...), ("203.0.113.1", 51820))

try:
    data, _ = sock.recvfrom(4096)
    # First byte of a handshake_response is message type = 2
    if data[0] == 2 and len(data) == 92:
        print("healthy")
    else:
        print(f"unexpected response: type={data[0]}, len={len(data)}")
except socket.timeout:
    print("no response, probably broken")

This works but is a non-trivial amount of crypto code to get right. The MAC1/MAC2 computation is where most hand-rolled implementations get it wrong. Worth it if you have a strong reason to avoid adding a tool dependency.

Option 3: use a tool that already does this

This is the pragmatic option. Several tools exist:

What about HTTPS-wrapping, Tailscale, headscale?

Some WireGuard deployments are wrapped in additional layers: Tailscale coordinates peer config via a control plane, headscale provides a self-hosted version, Innernet routes through a central coordinator. For monitoring these setups, the control plane has its own health signals, usually a REST API or a status endpoint.

That's great, but it doesn't substitute for end-to-end handshake checks. The control plane might report all peers connected while the WireGuard data plane between two peers is broken by a rogue firewall rule. Monitor both: the control plane for coordination health, the data plane for actual tunnel health.

The bigger lesson

This isn't really about WireGuard specifically. The pattern generalizes:

If a protocol silently drops invalid packets (which any well-designed authenticated protocol does, to resist scanning) then a port probe will tell you approximately nothing about whether the protocol is working.

OpenVPN mostly does the same thing. So does IKEv2, VLESS, VMess, Shadowsocks, Trojan. So do most modern VPN and proxy protocols. Port probes only work for things that respond to every TCP connection or UDP packet, regardless of content. That's essentially HTTP, SSH, SMTP, and similar server protocols where the protocol literally announces itself on connect.

For VPN monitoring, for any authenticated protocol monitoring, your choices are:

  1. Perform the actual protocol handshake (correct, more work).
  2. Have the service itself emit a heartbeat to an external monitor like Healthchecks.io (works but only tells you the process is running, not that clients can connect).
  3. Use synthetic user monitoring: a real client connects end-to-end on schedule (closest to what users experience, most work).

Pick any of those. Just don't rely on port probes.


If you want this handled for you

TunnelHQ performs real WireGuard handshakes against your servers every 1 to 10 minutes from check nodes in US, EU, APAC, and SA. When a handshake fails, an alert hits Slack, email, Telegram, Discord, or a webhook within a second. Free for 5 monitors, no credit card.

Start free or read the WireGuard monitoring page