ASPLOS 2018

Statistical Reconstruction of Class Hierarchies in Binaries

Omer Katz, Noam Rinetzky, Eran Yahav

TL;DR — C++ binaries lose class hierarchy information when compiled. This paper uses statistical methods to reconstruct the inheritance tree — identifying which classes exist, their methods, and parent-child relationships — from stripped binaries.

The Problem

C++ programs are built around rich class hierarchies. A Shape base class has virtual methods like area() and draw(), and derived classes like Circle and Rectangle override them with specialized behavior. This structure is central to understanding what a program does.

But after compilation, all of this vanishes. Classes become flat memory layouts. Virtual methods turn into entries in vtables — arrays of function pointers — and class names, inheritance relationships, and method ownership are all stripped away. A reverse engineer looking at the binary sees only anonymous functions like sub_401000 and opaque data structures. Figuring out which functions belong to the same class, or which class inherits from which, requires painstaking manual analysis.

The Key Idea

Even though class metadata is lost during compilation, the patterns it leaves behind in the binary are not random. Virtual table structures, constructor and destructor call patterns, and the way methods access object memory all carry statistical signals about the original class organization.

This paper exploits these signals in three stages:

  1. Identify classes by locating vtable structures and constructor/destructor patterns in the binary.
  2. Group methods by analyzing which functions appear together in vtables and share similar object-access patterns.
  3. Infer inheritance by comparing vtable layouts across identified classes — if one vtable is a prefix of another, the shorter one likely belongs to a parent class.

Interactive Demo

Watch how class hierarchy information is lost during compilation and then statistically reconstructed from the binary.

Class Hierarchy Reconstruction

Source Code
Compile & Strip
Analyze VTables
Reconstruct

C++ Source


                        

                        

Original Hierarchy

Reconstructed Hierarchy

Click "Compile & Strip" to see class information vanish, then "Reconstruct" to watch the hierarchy get rebuilt.

How It Works

Step 1
VTable Analysis
Locate tables
Step 2
Statistical Grouping
Cluster methods
Step 3
Hierarchy Inference
Recover tree

VTable Analysis

In C++ binaries, every class with virtual methods has a virtual table (vtable) — a read-only array of function pointers in the data section. The analysis begins by scanning the binary for these structures: arrays of valid code pointers in read-only memory, cross-referenced by constructor-like functions that write a vtable address into the first field of an allocated object.

Statistical Grouping

Once vtables are located, the method uses statistical patterns to group functions into classes. Functions that appear in the same vtable likely belong to the same class. Constructor and destructor pairs provide additional evidence — they share characteristic patterns such as calling a parent constructor before the child's own initialization code, or destroying members in reverse order.

Hierarchy Inference

The inheritance tree is recovered by comparing vtable layouts. In single inheritance, a derived class's vtable starts with the same entries as its parent's vtable, followed by its own new virtual methods. By checking for these prefix relationships among vtables, the algorithm reconstructs which classes inherit from which, building the full hierarchy from the bottom up.

Results

The approach was evaluated on real-world C++ binaries, including large open-source projects. It successfully recovers the majority of class hierarchies, including multi-level inheritance trees.

The statistical reconstruction accurately identifies classes and their inheritance relationships from stripped C++ binaries, demonstrating that vtable structures and constructor/destructor patterns carry enough information to reverse-engineer the original class hierarchy with high precision.

The technique works across different compilers and optimization levels, and complements existing binary analysis tools by providing high-level structural information that was previously lost during compilation.

@inproceedings{katz2018statistical, title={Statistical Reconstruction of Class Hierarchies in Binaries}, author={Katz, Omer and Rinetzky, Noam and Yahav, Eran}, booktitle={Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, year={2018}, publisher={ACM} }