Memory latency for application software developers: About memory latency

Memory latency for application software developers

Log an issue

Fork and edit

Discuss on Discord

Memory latency for application software developers

Memory latency is one of the key factors impacting application performance. Developers will benefit from having a good understanding of memory latency, how to measure it, and knowing when it can be improved.

Latency is the time between a request and the response. In the context of computer architecture, memory latency refers to the communication from the CPU to the memory devices. A request is either a load (read) or a store (write) and the response is the loaded data or an acknowledgement that the store took place. The faster this communication happens, the better the performance.

Computer hardware is constructed using multiple types of storage. In the area of memory latency, “memory devices” can be caches, external RAM, storage devices such as solid state drives (SDD) or hard disk drives (HDD).

In computer architecture, a CPU executes instructions, and some instructions use registers to transfer data to and from the memory. The memory system is designed with caches which are close to the CPU and faster to access. Instructions are also placed in caches which improve performance by eliminating the need to fetch the same instructions from the memory multiple times.

In theory, a CPU can operate without any caches at all, but it would be much slower.

How much slower would it be?

You can look at the list from Latency Numbers Every Programmer Should Know to learn about the time it takes to access various types of memory in a computer system.

Operation	ns	µs	ms	note
L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns			14x L1 cache
Mutex lock/unlock	25 ns
Main memory reference	100 ns			20x L2 cache, 200x L1 cache
Send 1K bytes over 1 Gbps network	10,000 ns	10 µs
Read 4K randomly from SSD	150,000 ns	150 µs		~1GB/sec SSD
Read 1 MB sequentially from memory	250,000 ns	250 µs
Round trip within same datacenter	500,000 ns	500 µs
Read 1 MB sequentially from SSD	1,000,000 ns	1,000 µs	1 ms	~1GB/sec SSD, 4X memory
Disk seek (HDD)	10,000,000 ns	10,000 µs	10 ms	20x datacenter roundtrip
Read 1 MB sequentially from disk	20,000,000 ns	20,000 µs	20 ms	80x memory, 20X SSD
Send packet CA -> Netherlands -> CA	150,000,000 ns	150,000 µs	150 ms

Modern CPUs and RAM have very different latencies compared to those from 30, 20, or even 10 years ago. Even so, the improvements generally scale uniformly over the evolution of the CPUs. A Cortex-A15 CPU from 15 years ago might be based on a very different implementation of the Arm architecture compared to a current Neoverse-V2 CPU, but the orders of magnitude of latencies between the CPU and memory devices is similar.

A great visualization of how latencies have been reduced over the years of CPU evolution is given in Colin Scott’s Interactive latencies page . Use the slider the the top to change the year and see how the latency numbers change.

Back

Memory latency for application software developers

Introduction

About memory latency

How latency impacts performance - part 1

How latency impacts performance - part 2

Cache alignment

Cache prefetching

Next Steps

Memory latency for application software developers