Synthesis of Memory Fences via Refinement Propagation

TL;DR: Writing correct concurrent programs for weak memory models (like x86-TSO or ARM) requires memory fences, but placing them correctly is error-prone. This paper automatically synthesizes the minimum set of fences needed to guarantee correctness.

The Problem

Modern processors reorder memory operations for performance. A store followed by a load might actually execute in reverse order, because the processor uses a store buffer to delay writes while it reads ahead. Under sequential consistency (SC), every thread sees the same global order of operations. But real hardware -- x86-TSO, ARM, POWER -- allows certain reorderings, breaking this intuition.

Without proper memory fences (also called barriers), concurrent programs can exhibit behaviors that are impossible under sequential consistency. A classic example is Peterson's mutual exclusion algorithm: it works perfectly under SC, but fails on x86-TSO because store-load reorderings let both threads enter the critical section simultaneously.

Developers must manually place fences to prevent harmful reorderings. But this is both error-prone (miss a fence and your program has a subtle concurrency bug) and over-conservative (add too many fences and you pay a significant performance penalty). The question is: where exactly should fences be placed, and how many are truly needed?

The Key Idea

Instead of guessing where to place fences, use abstract interpretation to analyze which memory reorderings can lead to specification violations. The method works by:

Starting with the program under a weak memory model (no fences)
Using abstract interpretation to detect which reorderings actually cause bugs
Iteratively adding fences through refinement propagation -- only where they are provably needed
Producing a minimal fence placement that guarantees correctness

The key insight is that refinement propagation tracks the abstract states where the program violates its specification, then traces back to identify exactly which reorderings caused those violations. A fence is added only if removing a specific reordering eliminates an unreachable (buggy) state. This avoids the brute-force approach of trying all possible fence combinations.

Interactive Demo: Memory Reordering Visualizer

Peterson's Mutual Exclusion Algorithm

Thread 0

Thread 1

Click the memory model tabs to see how different architectures require different fence placements. Step through: sequential execution, reordering (bug), then fence synthesis.

How It Works

Abstract Interpretation of Memory Models

The method models weak memory semantics as additional non-determinism in the program. Under TSO, for instance, each thread has a store buffer: writes are delayed and may be observed by other threads in a different order. The abstract domain captures the possible contents of these store buffers along with the global memory state.

Model the program under the weak memory model. Each possible reordering is an additional transition in the abstract transition system. The store buffer contents become part of the abstract state.

Compute the abstract reachable states. Use abstract interpretation to over-approximate all states reachable under the weak memory model. If a bad state (specification violation) is reachable, the program has a bug under that memory model.

Identify violating reorderings. For each abstract bad state, trace back through the abstract transition system to find which reordering transitions contributed to reaching it.

Propagate refinements. Insert a fence that eliminates the identified reordering. A fence between a store and a load forces the store buffer to flush, preventing the reorder. Re-analyze to check if more fences are needed.

Iterate until fixed point. Repeat steps 2--4 until no bad states are reachable. The result is a minimal set of fences: every fence is justified by a specific reordering that leads to a bug.

Refinement Propagation

The name "refinement propagation" comes from the key mechanism: when a bad state is found, the analysis refines the abstraction by adding a fence, then propagates this refinement through the abstract transition system. Unlike enumerate-and-check approaches, refinement propagation is directed -- it only considers fence placements that are directly motivated by detected violations. This makes it both efficient and guaranteed to produce minimal results.

Results

The approach produces provably minimal fence placements -- no fence can be removed without introducing a potential bug. It is sound (every synthesized placement guarantees correctness under the target memory model) and efficient (refinement propagation avoids exploring the exponential space of all possible fence combinations).

Evaluated on standard concurrent algorithms (Peterson's mutex, Dekker's algorithm, Lamport's bakery, etc.) across x86-TSO and PSO memory models, the method consistently finds the minimum number of fences. Compared to previous approaches that enumerate all possible placements, refinement propagation is significantly faster while producing identical (minimal) results.

Minimal placement: every synthesized fence is necessary; removing any one allows a specification violation
Soundness: the resulting program is correct under the target memory model
Scalability: refinement propagation avoids exponential search over fence combinations
Generality: applicable to different memory models (TSO, PSO) with appropriate abstract domains

Meshman, Y., Dan, A., Vechev, M., & Yahav, E. (2014). Synthesis of Memory Fences via Refinement Propagation. In Static Analysis Symposium (SAS), pp. 237--252. Springer.