Run Llama 3 on a Raspberry Pi 5 using ExecuTorch: Review

Log an issue

Fork and edit

Discuss on Discord

What you've learned

You should now know how to:

Use Docker to run Raspberry Pi OS on an Arm Linux server.
Compile a Large Language Model (LLM) using ExecuTorch.
Deploy the Llama 3 model on an edge device.
Describe how to run Llama 3 on a Raspberry Pi 5 using ExecuTorch.
Describe techniques for running large language models in an embedded environment.

What is the benefit of quantization?

It reduces the size of the model.

It improves the accuracy of the model.

It reduces the number of weights in the model.

What quantization scheme does Llama require to run on an embedded device such as the Raspberry Pi 5?

8-bit groupwise per token dynamic quantization of all the linear layers.

4-bit groupwise per token dynamic quantization of all the linear layers.

No quantization is needed.

Dynamic quantization happens at runtime.

False

True

Back