Congratulations! You have completed the process of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4.
Here’s a summary of what you built and how you can take your knowledge forward.
Using this Learning Path, you have:
Launched a Graviton4-powered EC2 instance – you set up a c8g.4xlarge
instance running Ubuntu 24.04 LTS, leveraging Arm-based compute for optimal price–performance.
Configured the development environment – you installed tools and dependencies, including Git, build tools, and Python packages for machine learning workloads.
Built Llama.cpp from source – you compiled the inference engine specifically for the Arm64 architecture to maximize performance on Graviton4.
Downloaded and optimized AFM-4.5B – you retrieved the 4.5-billion-parameter Arcee Foundation Model, converted it to the GGUF format, and created quantized versions (8-bit and 4-bit) to reduce memory usage and improve speed.
Ran inference and evaluation – you tested the model using interactive sessions and API endpoints, and benchmarked speed, memory usage, and model quality.
The benchmarking results demonstrate the power of quantization and Arm-based computing:
AWS Graviton4 processors, built on the Arm Neoverse V2 architecture, provide:
Now that you have a fully functional AFM-4.5B deployment, here are some ways to extend your learning:
Production deployment:
Application development:
llama-server
APITogether, Arcee AI’s foundation models, Llama.cpp’s efficient runtime, and Graviton4’s compute capabilities give you everything you need to build scalable, production-grade AI applications.
From chatbots and content generation to research tools, this stack strikes a balance between performance, cost, and developer control.
For more information on Arcee AI, and how you can build high-quality, secure, and cost-efficient AI solutions, please visit www.arcee.ai .