About this Learning Path

Who is this for?

This Learning Path is for software developers, ML engineers, and those looking to deploy production-ready LLM chatbots with Retrieval Augmented Generation (RAG) capabilities, knowledge base integration, and performance optimization for Arm Architecture.

What will you learn?

Upon completion of this learning path, you will be able to:

  • Set up llama-cpp-python optimized for Arm servers.
  • Implement RAG architecture using the Facebook AI Similarity Search (FAISS) vector database.
  • Optimize model performance through 4-bit quantization.
  • Build a web interface for document upload and chat.
  • Monitor and analyze inference performance metrics.

Prerequisites

Before starting, you will need the following:

  • A Google Cloud Axion (or other Arm) compute instance with at least 16 cores, 8GB of RAM, and 32GB disk space.
  • Basic understanding of Python and ML concepts.
  • Familiarity with REST APIs and web services.
  • Basic knowledge of vector databases.
  • Understanding of LLM fundamentals.
Next