The C++ Memory Model and Atomics

Learn about the C++ memory model for porting applications to Arm

Log an issue

Fork and edit

Discuss on Discord

Learn about the C++ memory model for porting applications to Arm

The C++ memory model for single threads

For a long time, writing C++ programs on single-core systems was straightforward. Compilers could reorder instructions freely, as long as the program’s observable behavior remained unchanged. This flexibility is commonly referred to as the “as-if” rule. Essentially, compilers could optimize away or move instructions around as if the code had not changed, provided the changes did not affect inputs, outputs, or volatile memory accesses.

The single-threaded world was simpler: you wrote code, the compiler safely reordered or eliminated instructions to make it faster, and your program performed better. But as multi-core processors and multi-threaded applications became common, instruction reordering was not only about improving performance - it could actually change the meaning of programs, especially when multiple threads accessed shared data simultaneously.

Expanding the memory model for multiple threads

When multi-threaded programming gained traction, compilers and CPUs needed precise rules about what reordering is allowed. This is where the formalized C++ memory model, introduced in C++11, steps in. Prior to C++11, concurrency in C++ was partially specified and relied on platform-specific behavior. Now, the language standard includes well-defined semantics ensuring that concurrent code can rely on a set of guaranteed rules.

Under the new model, if a piece of data is shared between threads without proper synchronization, you can no longer assume it behaves like single-threaded code. Instead, operations on this shared data may be reordered unless you explicitly prevent it using atomic operations or other synchronization primitives such as mutexes. To ensure correctness, C++ provides an array of memory ordering options (such as std::memory_order_relaxed, std::memory_order_acquire, and std::memory_order_release) that govern how loads and stores can be observed in a multi-threaded environment. Details can be found in the C++ reference manual.

C++ atomic memory ordering

In C++, std::memory_order atomic operations allow developers to specify how memory accesses, including regular, non-atomic memory accesses are ordered among atomic operations. Choosing the right memory order is crucial for balancing performance and correctness. Assume we have 2 atomic integers with initial values of 0:

    

        
        
std::atomic<int> x{0};
std::atomic<int> y{0};

Below are a few of C++’s atomic memory orders, along with a short code snippet illustrating what might or might not be reordered.

memory_order_relaxed

Relaxed operations do not impose ordering constraints beyond atomicity. They can be freely reordered with respect to other operations. This provides maximum performance but can lead to visibility issues if used incorrectly.

    

        
        
// Thread A:
r1 = y.load(std::memory_order_relaxed); // A
x.store(r1, std::memory_order_relaxed); // B

// Thread B:
r2 = x.load(std::memory_order_relaxed); // C 
y.store(42, std::memory_order_relaxed); // D
// These two stores could appear in any order relative to each other.

In the pseudo code snippet above, it’s possible for operation B to precede operation C, or the mirror possibility of D executing before A.

memory_order_acquire and memory_order_release

Acquire and release are used to synchronize atomic variables. In the example below, thread A writes to memory (allocating the string and setting data) and then uses a release-store to publish these updates. Thread B repeatedly performs an acquire-load until it sees the updated pointer. The acquire ensures that once Thread B sees a non-null pointer, all writes made by Thread A (including the update to data) become visible, synchronizing the two threads.

    

        
        
// Thread A 
p = new "Hello"; 
data = 42; 
atomic_store(ptr, p, memory_order_release); // Release: publish writes (p, data)

// Thread B 
while (atomic_load(ptr, memory_order_acquire) is null) { } // Acquire: wait until p is available
// Now, *p == "Hello" and data == 42 (synchronized with Thread A)

Sequential consistency, memory_order_seq_cst is the strongest order and the default ordering if nothing is specified.

There are several other memory ordering possibilities. For information on all memory ordering possibilities in the C++11 standard and their nuances, please refer to the C++ reference .

Back

Learn about the C++ memory model for porting applications to Arm

Introduction

Introduction to C++ Memory Models

The C++ Memory Model and Atomics

Walk through a Race condition example

Detecting race conditions

Next Steps

Learn about the C++ memory model for porting applications to Arm

The C++ memory model for single threads

Expanding the memory model for multiple threads

C++ atomic memory ordering