Understand the distributed ML architecture

Build ML Workflow Pipelines with Flyte and gRPC on Google Cloud C4A Axion processors

Log an issue

Fork and edit

Discuss on Discord

Build ML Workflow Pipelines with Flyte and gRPC on Google Cloud C4A Axion processors

Analyze the ML pipeline architecture

In this section, you explore the architecture behind the distributed machine learning pipeline built using Flyte and gRPC on Google Axion Arm-based infrastructure.

The architecture demonstrates how modern ML workflows are orchestrated using workflow engines while delegating specific tasks to distributed services.

Flyte manages the pipeline orchestration, while gRPC enables efficient communication between workflow tasks and external services.

System architecture

The ML pipeline consists of several tasks executed sequentially within the Flyte workflow.

    

        
        
Flyte Workflow Engine
        │
        ▼
Dataset Loader Task
        │
        ▼
Data Preprocessing Task
        │
        ▼
Feature Engineering Service (gRPC)
        │
        ▼
Model Training Task
        │
        ▼
Model Evaluation Task
        │
        ▼
Pipeline Result

Each component in the workflow performs a specific function within the machine learning pipeline.

Components

Flyte workflow engine

Flyte orchestrates the pipeline execution. It manages task dependencies, workflow execution, and data flow between tasks.

Key capabilities include:

defining ML pipelines as Python workflows
managing task dependencies
enabling reproducible ML experiments
scaling pipeline execution

Dataset loader

The dataset loader task simulates loading a training dataset that will be used for model training.

In real ML systems, this step might include:

loading datasets from object storage
retrieving data from data lakes
accessing distributed datasets

Data preprocessing

Data preprocessing transforms raw data into a format suitable for model training.

Typical preprocessing steps include:

cleaning data
normalizing values
handling missing data
encoding categorical variables

Feature engineering service (gRPC)

Feature engineering is implemented as a gRPC microservice.

The design allows feature-generation logic to run independently of the workflow engine.

Benefits include:

scalable feature generation
reusable feature services
independent scaling of compute resources
low-latency communication using gRPC

Model training

The training task uses generated features to train a machine learning model.

In production systems, this stage might include:

training regression models
training classification models
training deep learning models

Model evaluation

The evaluation step measures model performance.

Typical evaluation metrics include:

accuracy
precision
recall
F1 score

Based on the results, the workflow can determine whether to retrain the model.

Pipeline execution flow

The ML pipeline follows this execution sequence.

    

        
        
Load Dataset
      │
      ▼
Preprocess Data
      │
      ▼
Feature Engineering (gRPC Service)
      │
      ▼
Model Training
      │
      ▼
Model Evaluation
      │
      ▼
Pipeline Result

Each task executes sequentially while Flyte manages the workflow orchestration.

Benefits of this architecture

This architecture provides several advantages:

scalable ML pipeline orchestration
distributed feature engineering services
modular pipeline components
efficient task communication using gRPC
reproducible machine learning workflows

Running on Axion

This example demonstrates how machine learning workflows can run efficiently on Google Axion Arm-based processors.

Benefits include:

high performance per watt
efficient execution of data pipelines
scalable infrastructure for ML workloads
optimized performance for modern cloud applications

What you’ve learned

In this section, you explored the architecture behind the ML training pipeline.

You learned how:

Flyte orchestrates ML workflows
gRPC services enable distributed feature engineering
pipeline tasks interact through workflow dependencies
ML pipelines can scale across distributed infrastructure

This architecture underpins modern distributed machine learning systems running on Arm-based cloud infrastructure.

Back

Build ML Workflow Pipelines with Flyte and gRPC on Google Cloud C4A Axion processors

Introduction

Understand Flyte and gRPC ML workflows on Google Axion

Create a Google Axion C4A Arm virtual machine

Install Flyte and gRPC tools on Axion

Build a gRPC feature engineering service

Create ML Training Workflow

Execute and validate the ML pipeline

Understand the distributed ML architecture

Next Steps

Build ML Workflow Pipelines with Flyte and gRPC on Google Cloud C4A Axion processors

Analyze the ML pipeline architecture

System architecture

Components

Flyte workflow engine

Dataset loader

Data preprocessing

Feature engineering service (gRPC)

Model training

Model evaluation

Pipeline execution flow

Benefits of this architecture

Running on Axion

What you’ve learned