After building pqm4, a set of binaries is generated for each scheme and implementation. You can use these binaries to verify correctness, measure performance, and analyze resource usage on your Cortex-M4 board or in QEMU.
For example, building ML-KEM-768 produces binaries with names in this pattern:
bin/crypto_kem_ml-kem-768_<impl>_<type>.bin
The <impl> field identifies the implementation variant. The exact suffix depends on the scheme:
m4fspeed: Cortex-M4F implementation optimized for speed (used by ML-KEM)m4fstack: Cortex-M4F implementation optimized for stack size (used by ML-KEM)m4f: Cortex-M4F implementation used by some schemes such as BIKEclean: clean reference implementation from PQCleanThe clean variant uses a different file prefix: mupq_pqclean_crypto_kem_<scheme>_clean_<type>.elf
The <type> field identifies what the binary measures. The following section explains each type and how you can run them.
How you run a binary depends on whether you are using a physical board or QEMU.
Flash the binary and read the serial output.
Flash the binary:
st-flash write bin/<binary_name>.bin 0x8000000
Read the output from the board:
python3 hostside/host_unidirectional.py
Press the RESET button on the board to trigger execution and see the output.
Run the corresponding ELF file directly.
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/<binary_name>.elf
To exit QEMU, press Ctrl+A then X.
The test binary verifies that a scheme works correctly end to end.
Flash the ML-KEM-768 test binary on a physical board:
st-flash write bin/crypto_kem_ml-kem-768_m4fspeed_test.bin 0x8000000
Run the ML-KEM-768 test binary on QEMU:
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-768_m4fspeed_test.elf
It generates a keypair, performs encapsulation and decapsulation, and checks that both sides derive the same shared secret. It also tests failure cases such as an invalid secret key or invalid ciphertext.
The output after running the binary is similar to:
==========================
DONE key pair generation!
DONE encapsulation!
DONE decapsulation!
OK KEYS
+
...
OK invalid sk_a
+
OK invalid ciphertext
+
#
The speed binary measures execution time in CPU cycles for each operation.
Flash the ML-KEM-768 speed binary on a physical board:
st-flash write bin/crypto_kem_ml-kem-768_m4fspeed_speed.bin 0x8000000
Run the ML-KEM-768 speed binary on QEMU:
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-768_m4fspeed_speed.elf
The output after running the binary is similar to:
==========================
keypair cycles:
123456
encaps cycles:
234567
decaps cycles:
210000
=
The hashing binary measures how many cycles are spent in symmetric primitives such as SHA-2, SHA-3, and AES. The number of cycles shows how much of the overall algorithm cost comes from hashing.
Flash the ML-KEM-768 hashing binary on a physical board:
st-flash write bin/crypto_kem_ml-kem-768_m4fspeed_hashing.bin 0x8000000
Run the ML-KEM-768 hashing binary on QEMU:
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-768_m4fspeed_hashing.elf
The output after running the binary is similar to:
==========================
keypair hash cycles:
50000
encaps hash cycles:
80000
decaps hash cycles:
75000
=
The stack binary measures peak stack memory usage for each operation.
Flash the ML-KEM-768 stack binary on a physical board:
st-flash write bin/crypto_kem_ml-kem-768_m4fspeed_stack.bin 0x8000000
Run the ML-KEM-768 stack binary on QEMU:
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-768_m4fspeed_stack.elf
The output after running the binary is similar to:
==========================
keypair stack usage:
2048
encaps stack usage:
3072
decaps stack usage:
2800
#
Stack measurement might not work correctly on some boards due to platform-specific memory layout. Memory allocated outside functions, such as public keys and ciphertexts, is not included in these measurements.
The test vectors binary generates deterministic test vectors using a fixed random seed. These are used to validate correctness and compare different implementations against each other.
Flash the ML-KEM-768 test vectors binary on a physical board:
st-flash write bin/crypto_kem_ml-kem-768_m4fspeed_testvectors.bin 0x8000000
Run the ML-KEM-768 test vectors binary on QEMU:
qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-768_m4fspeed_testvectors.elf
To compare the on-device vectors against host-generated reference vectors, use the testvectors.py script described in the automated testing section.
pqm4 includes Python scripts that automate flashing, running, and checking results across multiple implementations.
Before running any Python scripts in this section, make sure your virtual environment is active:
source venv/bin/activate
The test.py script runs the test binary on your chosen platform and checks correctness automatically.
Run functional tests on NUCLEO-L476RG:
python3 test.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768
Run functional tests on QEMU:
python3 test.py -p mps2-an386 ml-kem-768
The output after running functional tests is similar to:
ml-kem-768 - m4fspeed SUCCESSFUL
ml-kem-768 - m4fstack SUCCESSFUL
ml-kem-768 - clean SUCCESSFUL
test: 100%|#############################################| 3/3 [00:12<00:00, 4.29s/it, ml-kem-768 - clean]
The testvectors.py script generates test vectors on your chosen platform and compares them with host-side results.
Run test vector validation on NUCLEO-L476RG:
python3 testvectors.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768
Run test vector validation on QEMU:
python3 testvectors.py -p mps2-an386 ml-kem-768
The output after running test vector validation is similar to:
ml-kem-768 - m4fspeed SUCCESSFUL
ml-kem-768 - m4fstack SUCCESSFUL
ml-kem-768 - clean SUCCESSFUL
test: 100%|#############################################| 3/3 [00:12<00:00, 4.29s/it, ml-kem-768 - clean]
The benchmarks.py script runs speed and stack benchmarks and stores the results in a benchmarks/ directory.
Run benchmarks on NUCLEO-L476RG:
python3 benchmarks.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768
Run benchmarks on QEMU:
python3 benchmarks.py -p mps2-an386 ml-kem-768
The output of running benchmarks is similar to:
speed: 33%|################ | 1/3 [00:20<00:40, 20.00s/it, ml-kem-768 - m4fspeed]
speed: 66%|########################### | 2/3 [00:40<00:20, 20.00s/it, ml-kem-768 - m4fstack]
speed: 100%|##############################| 3/3 [01:00<00:00, 20.00s/it, ml-kem-768 - clean]
Results are saved to benchmarks.csv. The screenshot shows an example of the benchmark output:
Example benchmark results for ML-KEM-768
You’ve now run functional tests, measured cycle counts and stack usage, and validated test vectors for a post-quantum KEM on Arm Cortex-M4. You can apply the same steps to any scheme included in pqm4 by substituting the scheme name in the binary path or script arguments.
Next, you’ll learn how to add a new cryptographic scheme or implementation to the pqm4 framework.