What you've learned

You should now know how to:

  • Download and build llama.cpp on your Arm server.
  • Download a pre-quantized Llama 3.1 model from Hugging Face.
  • Re-quantize the model weights to take advantage of the Arm KleidiAI kernels.
  • Compare the pre-quantized Llama 3.1 model weights performance to the re-quantized weights on your Arm CPU.

Knowledge Check

Can you run LLMs on Arm CPUs?

Can llama.cpp be built and run on CPU only?

Can you profile the time taken by the model to generate the output until the end of text?


Back
Next