About this Learning Path

Skill level:	Advanced
Reading time:	1 hr
Last updated:	16 Jan 2026

Skill level:

Advanced

Reading time:

1 hr

Last updated:

16 Jan 2026

Author:	Odin Shen, Arm
Arm IP:	Cortex-A
Tags:	ML Linux Python CPP Bash llama.cpp

Author:

Odin Shen, Arm

Arm IP:

Cortex-A

Tags:

Linux

Python

CPP

Bash

llama.cpp

Who is this for?

This is an advanced topic for developers and engineers who want to deploy Mixture of Experts (MoE) models, such as ERNIE 4.5, on edge devices. MoE architectures allow large LLMs with 21 billion or more parameters to run with only a fraction of their weights active per inference, making them ideal for resource constrained environments.

What will you learn?

Upon completion of this Learning Path, you will be able to:

Deploy MoE models like ERNIE-4.5 on edge devices using llama.cpp
Compare inference behavior between ERNIE-4.5 PT and Thinking versions
Measure performance impact of Armv9-specific hardware optimizations

Prerequisites

Before starting, you will need the following:

An Armv9 device with at least 32 GB of available disk space, for example, Radxa Orion O6

Run ERNIE-4.5 Mixture of Experts model on Armv9 with llama.cpp

Introduction

Understand Mixture of Experts architecture for edge deployment

Set up llama.cpp on an Armv9 development board

Compare ERNIE model behavior and expert routing

Optimize performance with Armv9 hardware features

Next Steps

Run ERNIE-4.5 Mixture of Experts model on Armv9 with llama.cpp

About this Learning Path

Who is this for?

What will you learn?

Prerequisites