A programming language’s memory model defines how operations on shared data can interleave at runtime. It sets rules for how compilers and hardware might reorder these operations.
In C++, the memory model specifically defines how threads interact with shared variables, ensuring consistent behavior across different compilers and architectures.
You can think of memory ordering as falling into four broad categories:
Source Code Order - the exact sequence in which you write statements. This is the most intuitive view because it directly reflects how code appears to you.
Here is an example:
int x = 5; // A
int z = x * 5; // B
int y = 42; // C
Program Order - the logical sequence that the compiler recognizes, and it might rearrange or optimize instructions under certain constraints to create a program that executes in fewer cycles. Although your source code lists statements in a particular order, the compiler can restructure them if it deems it safe. For example, the pseudo-assembly below reorders the source instructions:
LDR R1 #5 // A
LDR R2 #42 // C
MULT R3, #R1, #5 // B
Execution Order - this is the order in which the hardware actually issues and executes instructions. Modern CPUs often employ techniques to improve instruction-level parallelism such as out-of-order execution and speculation for performance. For instance, on an Arm-based system, you might see instructions issued in different order during runtime. The subtle difference between program order and execution order is that program order refers to the sequence seen in the binary whereas execution is the order in which those instructions are actually issued and retired. Even though the instructions are listed in one order, the CPU might reorder their micro-operations as long as it respects dependencies.
Hardware Perceived Order - this is the perspective observed by other devices in the system, which can differ if the hardware buffers writes or merges memory operations. Crucially, the hardware-perceived order can vary between CPU architectures, for example between x86 and Arm, and this should be considered when porting applications.
The memory models of Arm and x86 architectures differ in terms of ordering guarantees and required synchronizations.
x86 processors implement a relatively strong memory model, commonly referred to as Total Store Order (TSO). Under TSO, loads and stores appear to execute in program order, with only limited reordering permitted. This strong ordering means that software running on x86 generally relies on fewer memory barrier instructions, making it easier to reason about concurrency.
In contrast, Arm’s memory model is more relaxed, allowing greater reordering of memory operations to optimize performance and energy efficiency. This relaxed model provides less intuitive ordering guarantees, meaning that loads and stores can be observed out of order by other processors. This means that source code needs to correctly follow the language standard to ensure reliable behavior.