Now that you understand how fine-tuning works, it’s time to look at the actual code. In this section, you’ll walk through the key parts of the Llama3_3B_full_finetuning.py script and run it with the Raspberry Pi dataset to produce your own fine-tuned Llama model.
The script starts by importing PyTorch, dataset loading utilities, and the Hugging Face libraries used for supervised fine-tuning.
import torch
import argparse
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
The DATASET_PROMPT_TEMPLATE defines the instruction-following format for training data with three fields: instruction, input, and response. Each training example is formatted using this template so the model learns to recognize the pattern and produce structured answers.
The get_dataset() function loads the dataset from Hugging Face by default and formats each example using the template, appending the EOS (End of Sequence) token. You’ll pass in the Raspberry Pi dataset from a local JSONL file instead.
# Define prompt templates
DATASET_PROMPT_TEMPLATE = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: {}
### Input: {}
### Response: {}"""
def get_dataset(dataset_name, dataset_dir, dataset_files, eos_token, dataset_size=512):
# Preprocess the dataset
def preprocess(x):
texts = [
DATASET_PROMPT_TEMPLATE.format(instruction, input, output) + eos_token
for instruction, input, output in zip(x["instruction"], x["input"], x["output"])
]
return {"text": texts}
dataset = load_dataset(dataset_name, data_dir=dataset_dir, data_files=dataset_files, split="train")
if len(dataset) > dataset_size:
dataset = dataset.select(range(dataset_size)).shuffle(seed=42)
return dataset.map(preprocess, remove_columns=dataset.column_names, batched=True)
The from_pretrained() method downloads and initializes a pre-trained language model (from Hugging Face by default). The tokenizer is loaded alongside it, with the padding token set to match the EOS token (required for batched training).
# Load the model and tokenizer
print(f"Loading model: {args.model_name}")
model = AutoModelForCausalLM.from_pretrained(
args.model_name,
dtype=args.dtype,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(args.model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
With the model and tokenizer loaded, the script prepares the training data by calling get_dataset() with the tokenizer’s EOS token and the specified dataset size. By default the script downloads the Alpaca dataset from Hugging Face, but you’ll pass in the Raspberry Pi JSONL file from the command-line instead.
# Load and preprocess the dataset
print(f"Loading dataset with {args.dataset_size} samples...")
dataset = get_dataset(args.dataset, args.dataset_dir, args.dataset_files, tokenizer.eos_token, args.dataset_size)
The training configuration controls how the SFT process runs. Notable parameters include num_train_epochs (initially set to 0.01 for a warmup pass, then updated for full training), gradient_accumulation_steps (batches to accumulate before each weight update), learning_rate (optimizer step size), and max_length (maximum sequence length). The logging parameters determine where and how often training metrics are recorded.
# Configure the SFT config
config = {
"per_device_train_batch_size": args.batch_size,
"num_train_epochs": 0.05, # Warmup epoch
"gradient_accumulation_steps": args.gradient_accumulation_steps,
"learning_rate": args.learning_rate,
"optim": "adamw_torch",
"save_strategy": 'no',
"remove_unused_columns": False,
"seed": 42,
"dataset_text_field": "text",
"packing": False,
"max_length": args.seq_length,
"report_to": "none",
"logging_dir": args.log_dir,
"logging_steps": args.logging_steps,
"gradient_checkpointing": args.gradient_checkpointing, # Save memory
}
The script first optimizes the model graph for faster execution on the hardware using the torch.compile() function. A short warmup pass (0.05 epochs) triggers compilation so the overhead doesn’t affect the actual training run. After warmup, the script creates an SFTTrainer with the full epoch count and calls trainer.train(). The returned trainer_stats object contains metrics like loss, throughput, and training time.
# Compile model for faster training
print("Compiling model with torch.compile()...")
model = torch.compile(model)
# Warmup for torch compile
print("Running warmup for torch.compile()...")
SFTTrainer(
model=model,
processing_class=tokenizer,
train_dataset=dataset,
args=SFTConfig(**config),
).train()
# Train the model
print(f"\nStarting full fine-tuning for {args.num_epochs} epoch(s)...")
config["num_train_epochs"] = args.num_epochs
config["report_to"] = "tensorboard"
trainer = SFTTrainer(
model=model,
processing_class=tokenizer,
train_dataset=dataset,
args=SFTConfig(**config),
)
trainer_stats = trainer.train()
The script loads the Alpaca dataset from Hugging Face by default. You need to point it to the local Raspberry Pi JSONL dataset file instead.
First, open a new terminal on the DGX Spark (not inside the container) and navigate to the directory where you launched the Docker container. This is the directory that gets mounted as /workspace inside the container. Download the dataset file:
wget https://learn.arm.com/learning-paths/laptops-and-desktops/pytorch-finetuning-on-spark/raspberry_pi_qa.jsonl
Because this directory is mounted into the container with -v ${PWD}:/workspace, the file is immediately available inside the container at /workspace/raspberry_pi_qa.jsonl.
Back inside the container, copy the dataset into the script’s working directory:
cp /workspace/raspberry_pi_qa.jsonl .
With the dataset in place, you’re ready to run the fine-tuning. The command below trains the Llama 3.2 3B model using full fine-tuning on the Raspberry Pi dataset:
python Llama3_3B_full_finetuning.py \
--model_name "meta-llama/Llama-3.2-3B-Instruct" \
--dataset "json" \
--dataset_files="raspberry_pi_qa.jsonl" \
--dataset_size 300 \
--output_dir "/workspace/models/Llama-3.2-3B-FineTuned"
The --dataset_size 300 flag tells the script to use all entries in the Raspberry Pi dataset (the default is 500, but a smaller, focused dataset can be more effective than a larger generic one). The --output_dir flag saves the fine-tuned model and tokenizer to the specified directory. Because you mounted your current directory into the container with -v ${PWD}:/workspace, the saved model is also accessible from the host system.
Training takes a few minutes on DGX Spark. When it completes, you’ll see a summary with metrics like runtime, samples per second, and loss, followed by a confirmation that the model was saved.
In this section you:
--output_dirIn the next section, you’ll serve both the original and fine-tuned models and compare their responses to Raspberry Pi hardware questions.