Resources:
CPUs are extremely fast, so fast that we have several caches between the actual processing core and the RAM: L1, L2, and L3.
Desktop CPUs don't pull single bytes into caches. Instead, memory is fetched in form of cache lines. Each byte in memory falls into exactly one cache line. For most modern desktop CPUs, a cache line consists of 64 Bytes.
When the target bytes are not in the cache line present in the L1/L2/L3, the CPU will have a L1/L2/L3 cache miss (performance penalty).
If a byte is present in the cache, we can skip accessing the RAM and use the faster L1/L2/L3 caches instead.
If you pack your data tightly into cache lines, matching your accessing patterns, you can minimize the amount of cache misses:
In contrast, fragmenting each element into their own section of memory can result in more cache misses, as parts of the cache line are not utilized: