This section gives you an overview of the syntax of litmus tests and how to run them. More comprehensive information on developing and running litmus tests can be found in the diy7 documentation .
For this Learning Path, you use an abbreviated version of the MP.litmus
test included in the
herd7 online simulator
.
Open a file editor of your choice, and copy the contents below into a file called test.litmus
:
AArch64 MP
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0, #1 | LDR W0, [X1] ;
STR W0, [X1] | LDR W2, [X3] ;
MOV W2, #1 | ;
STR W2, [X3] | ;
exists
(1:X0=1 /\ 1:X2=0)
Now inspect this litmus file to gain a better understanding of the assembly code:
AArch64
. This is required because the diy7 tool supports architectures other than Arm. After the architecture label is MP
which is a label that gives this test its name. MP
stands for “Message Passing”.MP
”.P0
and P1
. It is possible to define more threads than just two, but in this Learning Path you work with only two threads to avoid increasing complexity.{..}
) determine the initial state of the CPU registers for this test.0:X1=x; 0:X3=y;
defines the initial state for P0
.x
.y
.1:X1=y; 1:X3=x;
defines the initial state for P1
.y
.x
.x
contains the message to pass, while the memory address y
is the message ready flag.P0
and P1
.P0
:MOV
and STR
writes a 1
into memory address x
(the payload).MOV
and STR
writes a 1
into memory address y
(the flag).P1
:LDR
reads the address y
(the flag). The value of the flag is stored in register W0
:LDR
reads the address x
(the payload). The value of the payload is stored in register W2
.P1
, is it possible to observe register W0
(the flag) set to 1 AND register W2
(the payload) set to 0?”X0
and X2
, not W0
and W2
. See the note below for more./\
is a logical AND, while \/
is a logical OR.X
and W
Registers:X
registers for storing addresses and for doing the condition check, but W
registers for everything else.X
registers for the addresses because they are 64-bit. W
registers are 32-bit. In fact, register Wn
is the lower 32-bits of register Xn
.X
registers. If all X
registers are used, the data type of each register needs to be declared on additional lines. For this reason, most tests are written as shown above. The way this is done may be changed in the future to reduce potential confusion around the mixed use of W
and X
registers, but all of this is functionally correct.Before you run this test with herd7
and litmus7
, you can hypothesize on what might be observed in registers X0
(flag) and X2
(payload) on thread P1
. Even though Arm machines are not Sequentially Consistent (modern machines usually aren’t), start by assuming this to be the case anyway. To reason on the execution of these threads in a Sequentially Consistent way, means to imagine the instructions on P0
and P1
are executed in some interleaved order.
Further, if you interleave these instructions in all possible permutations, you can figure out all of the possible valid outcomes of registers X0
(flag) and X2
(payload) on P1
. For the example test above, the possible valid outcomes of (X0,X2)
(or (flag,data)
) are (0,0)
, (0,1)
, & (1,1)
. Some permutations that result in these valid outcomes are shown below. These are not all the possible instruction permutations for this test. Listing them all would make this section needlessly long.
(0,0)
:
(P1) LDR W0, [X1] # P1 reads flag, gets 0
(P1) LDR W2, [X3] # P1 reads payload, gets 0
(P0) MOV W0, #1
(P0) STR W0, [X1] # P0 writes payload, sets 1
(P0) MOV W2, #1
(P0) STR W2, [X3] # P0 writes flag, sets 1
In this permutation of the test execution, P1
runs to completion before P0
even starts its execution. For this reason, P1
observes the initial values of 0 for both the flag and payload.
(0,1)
:
(P1) LDR W0, [X1] # P1 reads flag, gets 0
(P0) MOV W0, #1
(P0) STR W0, [X1] # P2 writes payload, sets 1
(P1) LDR W2, [X3] # P1 reads payload, gets 1
(P0) MOV W2, #1
(P0) STR W2, [X3] # P2 writes flag, sets 1
In this permutation of the test execution, P1
reads the initial value of the flag (the first line) because this instruction is executed before P0
writes the flag (the last list). However P1
reads the payload value of 1 because it executes after P0
writes the payload to 1 (third and forth lines).
(1,1)
:
(P0) MOV W0, #1
(P0) STR W0, [X1] # P2 writes payload, sets 1
(P0) MOV W2, #1
(P0) STR W2, [X3] # P2 writes flag, sets 1
(P1) LDR W0, [X1] # P1 reads flag, gets 1
(P1) LDR W2, [X3] # P1 reads payload, gets 1
In this permutation of the test execution, P2
runs to completion before P1
starts. Thus P1
observes both the flag and payload set to 1.
Result (1,0)
is not possible in a Sequentially Consistent execution of this test. This is because the instructions in P0
and P1
must execute in program order (a requirement for Sequential Consistency). It is only possible to get the result (1,0)
if the machine is not Sequentially Consistent. Not being Sequentially Consistent means that the memory access instructions (STR
and LDR
) can be executed out of program order if there are no dependencies between them.
Now that you have looked at a hypothesis on what you should see assuming a Sequentially Consistent machine, try running the test with herd7
. herd7
simulates all the possible instruction permutations to see which outcomes are possible.
Run the following command:
herd7 ./test.litmus
The output of the test on herd7
should look like:
Test MP Allowed
States 4
1:X0=0; 1:X2=0;
1:X0=0; 1:X2=1;
1:X0=1; 1:X2=0;
1:X0=1; 1:X2=1;
Ok
Witnesses
Positive: 1 Negative: 3
Outcome (1,0)
is observed when testing against the Arm memory model, as indicated by the positive witness on the test condition. This result demonstrates that the Arm memory model violates Sequential Consistency.
The Arm memory model tends to be considered a Relaxed Consistency model, which means that it is possible for an Arm CPU in the real world to use ordering rules that are stronger than Relaxed Consistency. Going stronger on the memory model will not violate the Arm ordering rules. A stronger memory model means that there might be less opportunity for hardware-based optimizations, it will not affect the correctness of code execution - assuming there are no bugs in the code. You can think of all Arm Neoverse CPUs as following a Relaxed Consistency model, which means that acquire-release ordering is supported.
In a Release Consistency model, ordinary memory accesses like STR
and LDR
do not need to follow program order. This relaxation in the ordering rules expands the list of instruction permutations in the litmus test above. It is these additional instruction permutations allowed by the Relaxed Consistency model that yield at least one permutation that results in (1,0)
. Below is one such example of a permutation. For this permutation, the LDR
instructions in P1
are reordered.
(1,0)
:
(P1) LDR W2, [X3] # P1 reads payload, gets 0
(P0) MOV W0, #1
(P0) STR W0, [X1] # P2 writes payload, sets 1
(P0) MOV W2, #1
(P0) STR W2, [X3] # P2 writes flag, sets 1
(P1) LDR W0, [X1] # P1 reads flag, gets 1
There are more possible permutations of instructions, including some that reorder the STR
instructions in P0
. You will not explore those here, this is what herd7
tests for.
Now that you see the results against the formal Arm memory model, you can test on actual hardware with litmus7
. Because litmus7
runs the test against actual hardware, it can’t guarantee that all instruction permutations are tested. This is because there isn’t a way to manipulate the CPU’s dynamic instruction scheduler - this is “hardwired” in the CPU design.
Instead, you have to run numerous iterations of the test to increase the probability that you see all possible outcomes. Thus, litmus7
can’t formally verify the CPU’s memory model; it can only provide empirical evidence to support a claim on the CPU’s memory model. It can however be used to disprove that the CPU does not follow a specific memory model. For example, you can prove that Arm is not Sequentially Consistent by finding at least one outcome that would not be allowed by Sequential Consistency.
You will now run the same test.litmus
file 1,000,000 times with litmus7
. One million is the default number of iterations, but this can be adjusted with the -s
switch. It is also possible to run the test on more than one CPU in parallel with the -a
switch. The more test iterations you run, the more confidence you can have that all possible outcomes have been captured.
Run the following command on your Arm Neoverse CPU-based instance:
litmus7 ./test.litmus
The output from running the test should look like:
Test MP Allowed
Histogram (4 states)
553396:>1:X0=0; 1:X2=0;
53040 *>1:X0=1; 1:X2=0;
96641 :>1:X0=0; 1:X2=1;
296923:>1:X0=1; 1:X2=1;
litmus7
reports the same four outcomes observed in the herd7
simulation. The outcome marked with the asterisk indicates the positive witness of the condition check, and the tool also reports the number of times each outcome was observed. This result suggests that this CPU is consistent with the formal memory model (as tested with herd7
), at least for this test.
To build further confidence, additional tests and more test iterations with litmus7
are needed. The
herd7 online simulator
contains additional tests that can be executed, and you can create additional tests.
Now you can move on to exploring thread synchronization in the next section.