Memory latency is one of the key factors impacting application performance. Developers will benefit from having a good understanding of memory latency, how to measure it, and knowing when it can be improved.
Latency is the time between a request and the response. In the context of computer architecture, memory latency refers to the communication from the CPU to the memory devices. A request is either a load (read) or a store (write) and the response is the loaded data or an acknowledgement that the store took place. The faster this communication happens, the better the performance.
Computer hardware is constructed using multiple types of storage. In the area of memory latency, “memory devices” can be caches, external RAM, storage devices such as solid state drives (SDD) or hard disk drives (HDD).
In computer architecture, a CPU executes instructions, and some instructions use registers to transfer data to and from the memory. The memory system is designed with caches which are close to the CPU and faster to access. Instructions are also placed in caches which improve performance by eliminating the need to fetch the same instructions from the memory multiple times.
In theory, a CPU can operate without any caches at all, but it would be much slower.
How much slower would it be?
You can look at the list from Latency Numbers Every Programmer Should Know to learn about the time it takes to access various types of memory in a computer system.
|L1 cache reference
|L2 cache reference
|14x L1 cache
|Main memory reference
|20x L2 cache, 200x L1 cache
|Send 1K bytes over 1 Gbps network
|Read 4K randomly from SSD
|Read 1 MB sequentially from memory
|Round trip within same datacenter
|Read 1 MB sequentially from SSD
|~1GB/sec SSD, 4X memory
|Disk seek (HDD)
|20x datacenter roundtrip
|Read 1 MB sequentially from disk
|80x memory, 20X SSD
|Send packet CA -> Netherlands -> CA
Modern CPUs and RAM have very different latencies compared to those from 30, 20, or even 10 years ago. Even so, the improvements generally scale uniformly over the evolution of the CPUs. A Cortex-A15 CPU from 15 years ago might be based on a very different implementation of the Arm architecture compared to a current Neoverse-V2 CPU, but the orders of magnitude of latencies between the CPU and memory devices is similar.
A great visualization of how latencies have been reduced over the years of CPU evolution is given in Colin Scott’s Interactive latencies page . Use the slider the the top to change the year and see how the latency numbers change.