You can use the events described below to demonstrate ITLB effectiveness:
//ITLB Effectiveness Metrics
PMU_EVENT_L1I_TLB,
PMU_EVENT_L1I_TLB_REFILL,
PMU_EVENT_L2D_TLB,
PMU_EVENT_L2D_TLB_REFILL,
PMU_EVENT_ITLB_WALK,
PMU_EVENT_INST_RETIRED,
To trigger an ITLB walk, there must be a miss in the ITLB. The following self-modifying code places a new instruction into memory that has not been accessed before:
.global itlb_test
.type itlb_test, "function"
.cfi_startproc
.global itlb_test
itlb_test:
LDR x0, =ret_opcode
LDR w1, [x0]
LDR x2, =0xc0000000
STR x1, [x2]
DC CVAU, x2
DSB SY
IC IVAU, x2
DSB SY
ISB
BR x2
ret_opcode:
RET
.cfi_endproc
The resulting event counts for the code are:
L1I_TLB is 57
L1I_TLB_REFILL is 1
L2D_TLB is 2
L2D_TLB_REFILL is 1
ITLB_WALK is 0
INST_RETIRED is 15
In this case, there are 57 accesses into the ITLB, but only one refill into both the L1 I-TLB and the L2 D-TLB. So, only one translation was not cached in the L1 I-TLB. ITLB_WALK
does not count in this scenario, because for it to count, a miss in the L1 I-TLB and L2 TLB must be driven at the same time. Through branch prediction or speculation, the data might already exist in the L2 TLB.
To trigger an L2_TLB miss, the following assembly function clears the TLB to remove any branch prediction or speculation:
.global clear_tlb
.type clear_tlb, "function"
.cfi_startproc
.global clear_tlb
clear_tlb:
TLBI ALLE3
DSB SY
ISB
.cfi_endproc
TLBI ALLE3
clears the entire ITLB, for only EL3 translations, and removes any predicted translations to force a miss. Then, new memory will be accessed.
L1I_TLB is 73
L1I_TLB_REFILL is 3
L2D_TLB is 6
L2D_TLB_REFILL is 4
ITLB_WALK is 3
INST_RETIRED is 17
This section describes what happens in the ITLB during a TLB access, which can be triggered using store instructions.
void stores()
{
for (volatile unsigned int i = 0; i < 30; i++)
{
*(volatile unsigned int*) (0x3C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x180000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x200000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x2C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x1C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x100000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x40000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x380000000 + (i*64)) = 0xDEADBEEF;
}
}
Events that always occur:
L1I_TLB
L1I_TLB is 227
L1I_TLB_REFILL is 0
L2D_TLB is 9
L2D_TLB_REFILL is 9
ITLB_WALK is 0
No ITLB walks are triggered since there might be speculation or branch prediction happening in the ITLB. Similarly, there are no L1 I-TLB refills, since there are no misses that result in allocations.
Additional events that occur with an L1 I-TLB miss:
L2D_TLB
and L1I_TLB_REFILL
.
Additional events that occur with an L2 TLB miss:
ITLB_WALK
and L2D_TLB_REFILL
.
To trigger TLB Refills and an ITLB walk, clear the TLB with the following code:
.global clear_tlb
.type clear_tlb, "function"
.cfi_startproc
.global clear_tlb
clear_tlb:
TLBI ALLE3
DSB SY
ISB
.cfi_endproc
The resulting event counts for the code are:
L1I_TLB is 73
L1I_TLB_REFILL is 3
L2D_TLB is 6
L2D_TLB_REFILL is 4
ITLB_WALK is 3
After clearing the ITLB, you can force new memory accesses translations, resulting in ITLB walks and L1 I-TLB refills. Note that ITLB_WALK
does not count walks triggered by TLB maintenance operations. Instead, these walks might be the result of the barrier instructions.
The following PMU events provide metrics on D-TLB effectiveness:
//DTLB Effectiveness Metrics
PMU_EVENT_L1D_TLB,
PMU_EVENT_L1D_TLB_REFILL,
PMU_EVENT_L2D_TLB,
PMU_EVENT_L2D_TLB_REFILL,
PMU_EVENT_INST_RETIRED,
PMU_EVENT_DTLB_WALK,
A lot of stores cause D-TLB accesses, refills, and walks.
void stores()
{
for (volatile unsigned int i = 0; i < 30; i++)
{
*(volatile unsigned int*) (0x3C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x180000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x200000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x2C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x1C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x100000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x40000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x380000000 + (i*64)) = 0xDEADBEEF;
}
}
Since these 8 stores access new memory translations, there are 8 counts of L1D_TLB_REFILL
, L2D_TLB_REFILL
, and DTLB_WALK
.
L1D_TLB is 731
L1D_TLB_REFILL is 8
L2D_TLB is 8
L2D_TLB_REFILL is 8
DTLB_WALK is 8
INST_RETIRED is 976
This section describes what happens in the D-TLB during a load instruction. The following code reads from many different addresses, which forces a D-TLB access.
void loads()
{
for (volatile unsigned int i = 0; i < 30; i++)
{
char *value = (char *)0x3C0000000 + (i*64);
char *value1 = (char *)0x200000000 + (i*64);
char *value2 = (char *)0x1C0000000 + (i*64);
char *value3 = (char *)0x180000000 + (i*64);
char *value4 = (char *)0x380000000 + (i*64);
char *value5 = (char *)0x40000000 + (i*64);
char *value6 = (char *)0x2C0000000 + (i*64);
}
}
Events that always occur during a D-TLB access are:
LD_SPEC
, MEM_ACCESS
, MEM_ACCESS_RD
, L1D_TLB
, and L1D_TLB_RD
.
LD_SPEC is 363
MEM_ACCESS is 317
MEM_ACCESS_RD is 278
L1D_TLB is 317
L1D_TLB_REFILL is 278
MEM_ACCESS
counts memory accesses issued by the Load Store Unit, both load and store operations. MEM_ACCESS_RD
is a subset of MEM_ACCESS
, only counting load operations that result in memory accesses. In this case, MEM_ACCESS
is equal to L1D_TLB
because each memory access has an associated D-TLB lookup. Similarly, each MEM_ACCESS_RD
has an associated D-TLB lookup.
Additional events that occur with a L1 D-TLB miss are:
L2D_TLB
, L2D_TLB_RD
, L1D_TLB_REFILL
, and L1D_TLB_REFILL_RD
.
L2D_TLB is 0
L2D_TLB_RD is 0
L1D_TLB_REFILL is 0
L1D_TLB_REFILL_RD is 0
There might be some branch prediction or speculation occurring in the TLB, resulting in no misses. The code below clears the TLB to force a miss.
.global clear_tlb
.type clear_tlb, "function"
.cfi_startproc
.global clear_tlb
clear_tlb:
TLBI ALLE3
DSB SY
ISB
.cfi_endproc
Calling this function before the loads produces these results:
L2D_TLB is 7
L2D_TLB_RD is 5
L1D_TLB_REFILL is 3
L1D_TLB_REFILL_RD is 1
Now there are refills in the L1 D-TLB caused by the TLB miss. There are also D-side TLB accesses caused by a memory read operation, counted by L2D_TLB_RD
. L2_TLB_REFILL_RD
counts any allocation into the L2 TLB caused by an I-side or D-side memory read operation.
Additional events that occur with a L2 TLB miss are:
DTLB WALK
, L2D_TLB_REFILL
, and L2D_TLB_REFILL_RD
.
DTLB_ALK is 2
L2D_TLB_REFILL is 5
L2D_TLB_REFILL_RD is 3
Since the TLB has been cleared, DTLB_WALK
and L2_TLB_REFILL
are triggered to get the translations from main memory and refill it into the L2 D-TLB.
This section describes what happens in the D-TLB during a store instruction. The following code writes to many different Normal Cacheable memory addresses, which forces a D-TLB access.
void stores()
{
for (volatile unsigned int i = 0; i < 30; i++)
{
*(volatile unsigned int*) (0x3C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x180000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x200000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x2C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x1C0000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x100000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x40000000 + (i*64)) = 0xDEADBEEF;
*(volatile unsigned int*) (0x380000000 + (i*64)) = 0xDEADBEEF;
}
}
Events that always occur:
ST_SPEC
, MEM_ACCESS
, MEM_ACCESS_WR
, L1D_TLB
, and L1D_TLB_WR
.
ST_SPEC is 468
MEM_ACCESS is 702
MEM_ACCESS_WR is 301
L1D_TLB is 702
L1D_TLB_WR is 301
MEM_ACCESS
counts memory accesses that the Load Store Unit issues, both load and store operations. MEM_ACCESS_WR
is a subset of MEM_ACCESS
, only counting store operations that result in memory accesses. In this case, MEM_ACCESS
is equal to L1D_TLB
because each memory access has an associated D-TLB lookup. Similarly, each MEM_ACCESS_RD
has an associated D-TLB lookup.
Additional events that occur with a L1 D-TLB miss:
L2D_TLB
, L2D_TLB_WR
, L1D_TLB_REFILL
, and L1D_TLB_REFILL_WR
.
L2D_TLB is 9
L2D_TLB_WR is
L1D_TLB_REFILL is 9
L1D_TLB_REFILL_WR is 9
Additional events that occur with an L2 TLB miss:
DTLB_WALK
, L2D_TLB_REFILL
, and L2D_TLB_REFILL_WR
.
DTLB_WALK is 9
L2D_TLB_REFILL is 9
L2D_TLB_REFILL_WR is 9
These results show that each TLB access is caused by a write operation and missed in each TLB level, causing a D-TLB walk. Each TLB miss also causes a refill into the L1 D-TLB first, and then secondly, a refill into the L2 D-TLB.