Memory latency is one of the key factors impacting application performance. Developers will benefit from having a good understanding of memory latency, how to measure it, and knowing when it can be improved.

Latency is the time between a request and the response. In the context of computer architecture, memory latency refers to the communication from the CPU to the memory devices. A request is either a load (read) or a store (write) and the response is the loaded data or an acknowledgement that the store took place. The faster this communication happens, the better the performance.

Computer hardware is constructed using multiple types of storage. In the area of memory latency, “memory devices” can be caches, external RAM, storage devices such as solid state drives (SDD) or hard disk drives (HDD).

In computer architecture, a CPU executes instructions, and some instructions use registers to transfer data to and from the memory. The memory system is designed with caches which are close to the CPU and faster to access. Instructions are also placed in caches which improve performance by eliminating the need to fetch the same instructions from the memory multiple times.

In theory, a CPU can operate without any caches at all, but it would be much slower.

How much slower would it be?

You can look at the list from Latency Numbers Every Programmer Should Know to learn about the time it takes to access various types of memory in a computer system.

Operationnsµsmsnote
L1 cache reference0.5 ns
Branch mispredict5 ns
L2 cache reference7 ns14x L1 cache
Mutex lock/unlock25 ns
Main memory reference100 ns20x L2 cache, 200x L1 cache
Send 1K bytes over 1 Gbps network10,000 ns10 µs
Read 4K randomly from SSD150,000 ns150 µs~1GB/sec SSD
Read 1 MB sequentially from memory250,000 ns250 µs
Round trip within same datacenter500,000 ns500 µs
Read 1 MB sequentially from SSD1,000,000 ns1,000 µs1 ms~1GB/sec SSD, 4X memory
Disk seek (HDD)10,000,000 ns10,000 µs10 ms20x datacenter roundtrip
Read 1 MB sequentially from disk20,000,000 ns20,000 µs20 ms80x memory, 20X SSD
Send packet CA -> Netherlands -> CA150,000,000 ns150,000 µs150 ms

Modern CPUs and RAM have very different latencies compared to those from 30, 20, or even 10 years ago. Even so, the improvements generally scale uniformly over the evolution of the CPUs. A Cortex-A15 CPU from 15 years ago might be based on a very different implementation of the Arm architecture compared to a current Neoverse-V2 CPU, but the orders of magnitude of latencies between the CPU and memory devices is similar.

A great visualization of how latencies have been reduced over the years of CPU evolution is given in Colin Scott’s Interactive latencies page . Use the slider the the top to change the year and see how the latency numbers change.

Back
Next