TL;DR: Writing correct concurrent programs for weak memory models (like x86-TSO or ARM) requires memory fences, but placing them correctly is error-prone. This paper automatically synthesizes the minimum set of fences needed to guarantee correctness.
The Problem
Modern processors reorder memory operations for performance. A store followed by a load might actually execute in reverse order, because the processor uses a store buffer to delay writes while it reads ahead. Under sequential consistency (SC), every thread sees the same global order of operations. But real hardware -- x86-TSO, ARM, POWER -- allows certain reorderings, breaking this intuition.
Without proper memory fences (also called barriers), concurrent programs can exhibit behaviors that are impossible under sequential consistency. A classic example is Peterson's mutual exclusion algorithm: it works perfectly under SC, but fails on x86-TSO because store-load reorderings let both threads enter the critical section simultaneously.
Developers must manually place fences to prevent harmful reorderings. But this is both error-prone (miss a fence and your program has a subtle concurrency bug) and over-conservative (add too many fences and you pay a significant performance penalty). The question is: where exactly should fences be placed, and how many are truly needed?
The Key Idea
Instead of guessing where to place fences, use abstract interpretation to analyze which memory reorderings can lead to specification violations. The method works by:
- Starting with the program under a weak memory model (no fences)
- Using abstract interpretation to detect which reorderings actually cause bugs
- Iteratively adding fences through refinement propagation -- only where they are provably needed
- Producing a minimal fence placement that guarantees correctness
The key insight is that refinement propagation tracks the abstract states where the program violates its specification, then traces back to identify exactly which reorderings caused those violations. A fence is added only if removing a specific reordering eliminates an unreachable (buggy) state. This avoids the brute-force approach of trying all possible fence combinations.
Interactive Demo: Memory Reordering Visualizer
Peterson's Mutual Exclusion Algorithm
Click the memory model tabs to see how different architectures require different fence placements. Step through: sequential execution, reordering (bug), then fence synthesis.
How It Works
Abstract Interpretation of Memory Models
The method models weak memory semantics as additional non-determinism in the program. Under TSO, for instance, each thread has a store buffer: writes are delayed and may be observed by other threads in a different order. The abstract domain captures the possible contents of these store buffers along with the global memory state.
Refinement Propagation
The name "refinement propagation" comes from the key mechanism: when a bad state is found, the analysis refines the abstraction by adding a fence, then propagates this refinement through the abstract transition system. Unlike enumerate-and-check approaches, refinement propagation is directed -- it only considers fence placements that are directly motivated by detected violations. This makes it both efficient and guaranteed to produce minimal results.
Results
The approach produces provably minimal fence placements -- no fence can be removed without introducing a potential bug. It is sound (every synthesized placement guarantees correctness under the target memory model) and efficient (refinement propagation avoids exploring the exponential space of all possible fence combinations).
Evaluated on standard concurrent algorithms (Peterson's mutex, Dekker's algorithm, Lamport's bakery, etc.) across x86-TSO and PSO memory models, the method consistently finds the minimum number of fences. Compared to previous approaches that enumerate all possible placements, refinement propagation is significantly faster while producing identical (minimal) results.
- Minimal placement: every synthesized fence is necessary; removing any one allows a specification violation
- Soundness: the resulting program is correct under the target memory model
- Scalability: refinement propagation avoids exponential search over fence combinations
- Generality: applicable to different memory models (TSO, PSO) with appropriate abstract domains