FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware

TL;DR: IoT firmware often contains known vulnerabilities from outdated libraries, but detecting them is hard because firmware is compiled for different architectures with different compilers. FirmUp uses statistical binary similarity to match firmware functions against known vulnerable functions, even across architectures.

The Problem

Billions of IoT devices run firmware with known CVEs buried in embedded libraries. Routers, cameras, medical devices, industrial controllers -- they all ship with compiled copies of libraries like OpenSSL, BusyBox, and libpng, and these copies are rarely updated. The result is a massive attack surface of known vulnerabilities hiding in deployed firmware.

But you cannot just string-match or diff the binaries. The same vulnerability looks completely different when compiled for ARM versus MIPS versus x86. Different compilers, optimization levels, and calling conventions produce wildly different machine code from the same source. A vulnerability in OpenSSL 1.0.1 compiled for a Broadcom MIPS router chip shares almost nothing at the byte level with the same vulnerability compiled for an ARM-based IP camera.

Existing approaches based on dynamic analysis require actually running the firmware, which is impractical at scale. Signature-based methods break when the compilation changes even slightly. We need a technique that can recognize the same vulnerability across architectures, compilers, and optimization levels -- statically, without executing anything.

The Key Idea

FirmUp decomposes binary functions into short instruction sequences called strands -- chains of data-dependent instructions extracted from the binary. Each strand captures a small slice of the function's computation: how a particular value flows through registers and memory.

The key insight is that strands can be normalized to be architecture-independent. By abstracting away specific register names, immediate values, and architecture-specific idioms, a strand from ARM and a strand from MIPS that compute the same thing become comparable. FirmUp then uses statistical similarity -- comparing the distribution of normalized strands between two functions -- to determine whether a firmware function matches a known vulnerable function.

How Strands Work

Binary Function

raw instructions

→

Strand Extraction

data-flow slicing

→

Normalization

arch-independent

→

Statistical Match

similarity score

Each function is decomposed into strands, normalized to remove architecture-specific details, then compared statistically against a database of known vulnerable functions.

Interactive Demo: Firmware Vulnerability Scanner

Firmware Vulnerability Scanner

firmware_router_v2.3.bin Architecture: ARM Cortex-A7

Functions scanned: 0 Vulnerable: 0 Safe: 0

Strand Matching Detail:

Firmware Strands

Known Vulnerable Strands

Click "Scan for Vulnerabilities" to simulate FirmUp analyzing firmware functions. Toggle architectures to see that FirmUp detects the same vulnerabilities regardless of target architecture.

How It Works

1. Strand Extraction

Given a binary function, FirmUp extracts strands by performing backward data-flow slicing from each instruction. A strand captures the chain of instructions that contribute to a particular computation. For example, a strand might trace how a buffer length is computed: from an initial load, through arithmetic operations, to a comparison or memory write.

Strands are more robust than whole-function comparison because they capture local computational patterns that tend to be preserved across compilation variants, even when the overall function layout changes dramatically.

2. Normalization

Raw strands are architecture-specific -- they use ARM registers (R0-R15), MIPS registers ($t0-$t9), or x86 registers (EAX, EBX). FirmUp normalizes strands by replacing concrete register names with canonical names, abstracting immediate values into categories (small constant, address, offset), and mapping architecture-specific instructions to a common intermediate representation.

After normalization, a strand from ARM and a strand from MIPS that compute the same thing produce the same normalized form. This is what enables cross-architecture matching.

3. Statistical Matching

Rather than requiring exact strand matches, FirmUp uses statistical similarity. It computes a similarity score between two functions by comparing their multisets of normalized strands. The score reflects what fraction of strands in the firmware function also appear in the known vulnerable function, weighted by strand specificity (rare strands count more than common ones).

This statistical approach is crucial: compilation differences may add, remove, or slightly alter some strands, but the overall distribution remains recognizably similar for functions derived from the same source.

Results

FirmUp achieves high precision in detecting known vulnerabilities across different architectures. In experiments on real-world firmware images from major router vendors, FirmUp identified known CVEs in OpenSSL, BusyBox, and other embedded libraries -- including vulnerabilities that were missed by vendor security advisories. The approach scales to large firmware images and works without requiring source code, symbols, or execution of the firmware.

@inproceedings{david2018firmup, title={FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware}, author={David, Yaniv and Partush, Nimrod and Yahav, Eran}, booktitle={Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, year={2018}, doi={10.1145/3173162.3177157} }