How Device Fingerprinting Works #
Most SIEMs identify devices by IP address or MAC address. Huginn goes further: it composes multiple weak signals into a strong device identity using hyperdimensional memory (HDM) vectors.
The Problem with MAC-Based Identification #
A MAC address tells you the manufacturer (via the OUI prefix). 88:66:5A:xx:xx:xx is Apple. But is it an iPhone, an iPad, a MacBook, or an Apple TV? MAC addresses can’t distinguish device types within a manufacturer, and they can be spoofed.
Huginn’s Approach: Signal Composition #
Every device on your network emits multiple identification signals through normal operation:
| Signal | Source | What It Reveals |
|---|---|---|
| MAC OUI | DHCP/ARP | Manufacturer (Apple, Samsung, Intel) |
| DHCP Option 55 | DHCP requests | OS family – the order of requested parameters is distinctive per OS |
| DHCP Vendor Class | DHCP requests | Sometimes includes OS version (e.g., “MSFT 5.0” = Windows) |
| DHCP Hostname | DHCP requests | Often reveals device type (“iPhone-John”, “DESKTOP-ABC123”) |
| DNS Query Patterns | DNS logs | OS-specific probes (captive.apple.com, connectivitycheck.gstatic.com) |
| Connection Profile | Firewall logs | Typical destinations, ports, and protocols |
| Traffic Timing | Firewall logs | Diurnal patterns, beacon intervals (IoT devices are very regular) |
Any single signal is ambiguous. Combined, they’re highly discriminative.
How It Works #
Huginn uses hyperdimensional computing to fuse these signals into a single, compact device identity. Each signal is encoded as a high-dimensional binary vector, and the vectors are composed using algebraic operations that preserve the contribution of every signal while producing a fixed-size result.
The composed vector acts as a holistic fingerprint. Two devices with similar signal profiles produce similar vectors; devices with different profiles are far apart in the vector space. Matching a new device against the known profile database is fast – sublinear in the number of profiles – so it scales without issue even on modest hardware.
Confidence reflects both how closely a device matches a known profile and how many independent signals contributed to the match:
iPhone 15, iOS 17– 94% confidence (many strong signals agree)Samsung Galaxy, Android– 82% confidence (most signals match, one is ambiguous)Unknown IoT device, similar to Nest Thermostat– 67% confidence (partial signal overlap)
Incremental Learning #
Unlike ML-based approaches that require training on labeled datasets, HDM vectors update incrementally. As new signals arrive for a device, the fingerprint is refined in place. Within normal observation, most devices accumulate enough signal for reliable classification.
The system also learns new device profiles from your network. If you have a device that doesn’t match any known profile, Huginn creates a new profile entry. Over time, your local installation builds a device database tailored to your specific network.
Why Not Machine Learning? #
Traditional ML approaches to device fingerprinting (random forests, neural networks) achieve slightly higher accuracy on benchmark datasets. But they require:
- Labeled training data (someone has to identify every device to build the training set)
- A training pipeline (feature extraction, model training, validation)
- Retraining when new device types appear
- Significant compute (GPU for neural approaches)
HDM vectors trade marginal accuracy for massive operational simplicity:
- No training data needed – profiles are composed from signal definitions
- No training pipeline – vectors update incrementally from observation
- New device types are learned automatically from network observation
- Runs on any CPU – the math is cheap bitwise operations
For a SIEM that needs to work out of the box on a $200 mini-PC, this tradeoff is the right one.