What you've learned

You should now know how to:

  • Download and build llama.cpp on your Arm server.
  • Download a pre-quantized Llama 3.1 model from Hugging Face.
  • Run the pre-quantized model on your Arm CPU and measure the performance.

Knowledge Check

Can you run LLMs on Arm CPUs?

Can llama.cpp be built and run on CPU only?

Can you profile the time taken by the model to generate the output until the end of text?


Back
Next