About this Learning Path

Who is this for?

This is an advanced topic for developers and engineers who want to deploy Mixture of Experts (MoE) models, such as ERNIE 4.5, on edge devices. MoE architectures allow large LLMs with 21 billion or more parameters to run with only a fraction of their weights active per inference, making them ideal for resource constrained environments.

What will you learn?

Upon completion of this Learning Path, you will be able to:

  • Deploy MoE models like ERNIE-4.5 on edge devices using llama.cpp
  • Compare inference behavior between ERNIE-4.5 PT and Thinking versions
  • Measure performance impact of Armv9-specific hardware optimizations

Prerequisites

Before starting, you will need the following:

  • An Armv9 device with at least 32 GB of available disk space, for example, Radxa Orion O6
Next