Which is the most important factor when considering optimal data layout for SIMD?

While not wasting storage in the SIMD vectors is important, what is more important is the reduction of the iterations by processing multiple elements per iteration.

Struct of Arrays (SoA) is a better fit, because the for loops are more efficient to iterate through the data in the loop, fewer jump instructions are required.

How many elements are unused in a 128-bit SIMD vector when storing 3D positional information (coordinates x,y,z) as 32-bit floats?

If we store (x, y, z) in a 128-bit SIMD vector, using 32-bit float elements, we would have a representation like `| x | y | z | (unused) |`. This would mean that we would be wasting 25% of the vector's storage. Having said that, Aarch64 does not have an alignment requirement so the lack of aligned data in a packed scenario does not necessarily constitute a performance hit on Arm. That is not the case however with other ISAs.

Similarly to the previous question, what would the percentage be if we used 64-bit floats to store the information in a 256-bit vector?

Similarly, we have doubled the size of the vector and the element, but we are still having one element unused, so 25% is wasted.