OOPSLA 2014

Abstract Semantic Differencing via Speculative Correlation

Nimrod Partush, Eran Yahav

TL;DR: Textual diffs show what changed in code, but not whether the behavior changed. Abstract semantic differencing compares two program versions at the semantic level — identifying inputs where the programs produce different outputs, not just where the text differs.

The Problem

When a developer modifies code, the textual diff might show 50 changed lines. But how many of those changes actually affect the program's behavior? A variable rename, whitespace reformatting, or code reorganization can produce a large textual diff without changing what the program computes. Conversely, a single-character change can completely alter the semantics.

Developers need to answer a more precise question: "did the semantics actually change, and for which inputs?" Textual diffs cannot answer this. They conflate cosmetic changes with behavioral ones, leaving reviewers to mentally simulate execution to determine what actually matters.

The Key Idea

Use abstract interpretation to compute a sound over-approximation of the inputs where the two program versions differ. Rather than comparing text, the analysis compares the abstract semantics of the old and new programs, producing a semantic difference: a description of the input conditions under which the two versions produce different outputs.

A central challenge is relating variables across versions — after refactoring, a variable named count might become total. The paper introduces speculative correlation, a heuristic that maps variables between versions to enable precise comparison. By speculatively correlating corresponding variables, the analysis can determine that two superficially different programs compute the same function.

Interactive Demo: Semantic Diff Visualizer

Compare Two Program Versions

Select an example to see how textual and semantic diffs reveal different information about a code change.

Version A (Before)

Version B (After)

Textual change Added/moved Semantic difference

Enter an input value and run both versions to see whether they produce the same or different output.

How It Works

1

Abstract Interpretation

Each program version is analyzed using abstract interpretation, which computes an over-approximation of all possible program states at each point. Rather than tracking exact values, the analysis tracks abstract properties — intervals, signs, relationships between variables — that soundly describe the set of concrete values.

2

Speculative Correlation

To compare two versions, the analysis needs to relate their variables. Speculative correlation heuristically maps variables between versions based on naming, usage patterns, and program structure. For instance, if version A uses count and version B uses total in the same role, the analysis correlates them. This mapping allows the analysis to determine that two differently-written programs compute equivalent results.

3

Difference Summarization

By comparing the abstract states of both versions, the analysis identifies where they diverge. The result is a semantic difference: a condition on the input that describes exactly (up to over-approximation) when the two versions produce different outputs. If the semantic difference is empty, the change is guaranteed to be behavior-preserving.

Results

The approach precisely identifies behavioral changes between program versions, distinguishing true semantic modifications from cosmetic edits. Key findings:

Abstract semantic differencing moves beyond textual comparison to answer the question developers actually care about: not "what text changed?" but "for which inputs does the program behave differently?"

Partush, N. & Yahav, E. (2014). Abstract Semantic Differencing via Speculative Correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14). ACM, New York, NY, USA, 811–828.