Memory latency problems are mostly apparent in algorithms that are memory-bound, or in other words, when the CPU spends most of the time waiting for data to transfer from/to the RAM.
Although sometimes we can remedy the problem with faster RAM, we will still hit the upper limit of maximum RAM frequency supported by our CPU/motherboard. The proper solution is to minimize and group memory access in the algorithm so that the CPU is not stalled waiting from data to arrive from the memory. Proper alignment and cache prefetching also greatly help.
All that matters is that the data is in the L1 cache when the CPU needs it. Taking into account that memory loads can take up to 100ns to complete, you should do a rough count of the cycles each iteration takes and use that. If you can't do that, testing multiples of cache line size is a decent alternative.