VMCAI 2016

D3: Data-Driven Disjunctive Abstraction

Hila Peleg, Sharon Shoham, Eran Yahav

TL;DR: Abstract interpretation is powerful but choosing the right abstraction is hard. D3 automatically learns good disjunctive abstractions from program execution data — using data to guide which abstract elements to track, making verification both precise and efficient.

The Problem

In abstract interpretation, we approximate the set of reachable program states using abstract elements. A classic choice is the interval domain: instead of tracking every possible value of variable x, we track a range like [2, 10]. Simple and efficient, but often too imprecise — the single bounding box lumps together states that behave very differently.

Disjunctive abstraction fixes this by tracking multiple abstract elements simultaneously: instead of one big interval, we might track [2, 4] OR [8, 10]. The gap in the middle represents unreachable states we can now exclude. More precise? Absolutely. But the number of possible disjuncts grows exponentially, making naive approaches intractable.

The fundamental challenge: how do you pick the right disjuncts? Too few and you lose precision. Too many and verification becomes impractical. The sweet spot depends entirely on the program being analyzed, and finding it manually requires deep expertise.

The Key Idea

D3 takes a data-driven approach: instead of guessing which disjuncts to track, run the program on concrete inputs and observe which states actually appear. The concrete execution traces reveal the structure of the reachable state space — clusters of states that belong together, gaps that separate distinct behaviors.

By clustering these observed states, D3 automatically discovers a small set of disjuncts that tightly covers the real program behaviors while excluding most spurious states. The key insight is that concrete traces, while incomplete, contain enough information to guide the abstract domain selection.

This sidesteps the exponential blowup: instead of exploring all possible disjunctions, D3 learns only the ones that matter from the data. The result is a disjunctive abstraction that is both precise (few spurious states) and efficient (few disjuncts).

Interactive Demo: Abstraction Refinement

Disjunctive Abstraction Visualizer

Each dot represents a concrete program state observed during execution. The colored boxes show how the abstract domain covers these states.

Spurious states:
--
Precision:
--

Click on the plane to add new trace points. Right-click or shift-click on existing dots to remove them. Toggle between a single bounding box and D3 disjunctive abstraction to see the precision difference.

How It Works

1. Trace Collection

Run the program on a set of concrete inputs and record the program states at key points (loop heads, procedure entries).

2. Clustering

Group observed states into clusters using their spatial structure. States that are close together form natural disjuncts.

3. Disjunct Selection

For each cluster, compute a tight abstract element (e.g., a bounding box). The resulting set of abstract elements forms the disjunctive domain.

Trace Collection

D3 begins by executing the program on a sample of concrete inputs. At designated observation points — typically loop heads and function boundaries — the tool records the values of program variables. These recorded values form the concrete trace: a snapshot of where the program actually goes in the state space.

Clustering

The collected states often exhibit natural groupings. For example, a loop that processes two different cases might produce two distinct clusters of states. D3 uses clustering algorithms to automatically identify these groups. The number of clusters is kept small to ensure efficiency — typically 2 to 4 disjuncts suffice for most programs.

Disjunct Selection

Each cluster is then abstracted into a single abstract element (e.g., a bounding box in the interval domain, or a convex hull in the polyhedra domain). The union of these abstract elements forms the disjunctive abstraction. Because each element tightly wraps a cluster of real states, the overall abstraction has far fewer spurious states than a single monolithic approximation.

Results

D3 achieves significantly more precise verification compared to standard (non-disjunctive) abstract interpretation, while keeping the number of disjuncts small enough for efficient analysis. The data-driven selection consistently outperforms heuristic approaches to disjunct selection, reducing spurious states by 40–80% across the benchmarks evaluated.

The key results from the paper:

Hila Peleg, Sharon Shoham, Eran Yahav. "D3: Data-Driven Disjunctive Abstraction." In Proceedings of the 17th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), 2016. @inproceedings{peleg2016d3, title={D3: Data-Driven Disjunctive Abstraction}, author={Peleg, Hila and Shoham, Sharon and Yahav, Eran}, booktitle={VMCAI}, year={2016} }