The instructions in this learning path are for any Arm server running Ubuntu 22.04 LTS.
To start, you will need to install PyTorch on your Arm machine. PyTorch is a widely used machine learning framework for Python. You will use PyTorch to deploy a Natural Language Processing (NLP) model on your Arm machine.
Hugging Face is an open source AI community where you can host your own AI models, train them and collaborate with others in the community. You can browse through the thousands of models that are available for a variety of use cases like NLP, audio and computer vision. Hugging Face also has a huge collection of NLP models for tasks like translation, sentiment analysis, summarization and text generation.
In this learning path, you will download a popular RoBERTa sentiment analysis NLP model from Hugging Face and deploy it using PyTorch on your Arm machine. Sentiment analysis is a type of NLP algorithm used to identify and classify the emotional tone of a piece of text. This model has been trained with over 124 million tweets.
The Hugging Face Transformers library provides APIs and tools that let you easily download and fine-tune pre-trained models. Hugging Face Transformers support multiple machine learning frameworks like PyTorch, TensorFlow and JAX. You will use Transformers with PyTorch to download the model from Hugging Face.
To install the Transformers library for PyTorch, run the following command:
pip install 'transformers[torch]'
The full classification example script used in this learning path uses SciPy, an open source Python library to process the inference output from the NLP model. To install SciPy, run the following command:
pip install scipy
You are now ready to download this model and run a full classification example from Hugging Face on your machine. Using a file editor of your choice, create a file named sentiment-analysis.py
:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
import transformers
transformers.logging.set_verbosity_error()
# Preprocess text (username and link placeholders)
def preprocess(text):
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
text = "Covid cases are increasing fast!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
l = config.id2label[ranking[i]]
s = scores[ranking[i]]
print(f"{i+1}) {l} {np.round(float(s), 4)}")
This example does the following:
tokenizer
which prepares the inputs as tensors for the modelRun this script:
python sentiment-analysis.py
The output from this script should look like:
1) negative 0.7236
2) neutral 0.2287
3) positive 0.0477
You have successfully performed sentiment analysis on the input text, all running on your Arm AArch64 CPU. You can change the input text in your example and re-run the classification example.
Now that you have run the model, let’s add the ability to profile the model execution. You can use the
PyTorch Profiler
to analyze the execution time on the CPU. Copy the contents shown below into a file named sentiment-analysis-profile.py
:
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax
import transformers
transformers.logging.set_verbosity_error()
import torch
from torch.profiler import profile, record_function, ProfilerActivity
# Preprocess text (username and link placeholders)
def preprocess(text):
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
text = "Covid cases are increasing fast!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CPU],
record_shapes=True) as prof:
with record_function("model_inference"):
output = model(**encoded_input)
# print basic stats
print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=10))
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
l = config.id2label[ranking[i]]
s = scores[ranking[i]]
print(f"{i+1}) {l} {np.round(float(s), 4)}")
Run this python script:
python sentiment-analysis-profile.py
The output should look similar to:
STAGE:2024-02-27 17:26:22 18170:18170 ActivityProfilerController.cpp:314] Completed Stage: Warm Up
STAGE:2024-02-27 17:26:22 18170:18170 ActivityProfilerController.cpp:320] Completed Stage: Collection
STAGE:2024-02-27 17:26:22 18170:18170 ActivityProfilerController.cpp:324] Completed Stage: Post Processing
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------
aten::addmm 56.56% 29.355ms 57.96% 30.085ms 406.554us 74
model_inference 15.24% 7.910ms 100.00% 51.903ms 51.903ms 1
aten::bmm 4.86% 2.521ms 7.37% 3.823ms 159.292us 24
aten::select 2.55% 1.323ms 2.58% 1.337ms 1.535us 871
aten::view 1.98% 1.030ms 1.98% 1.030ms 3.962us 260
aten::linear 1.97% 1.022ms 62.89% 32.640ms 441.081us 74
aten::native_layer_norm 1.87% 968.000us 2.07% 1.072ms 42.880us 25
aten::gelu 1.76% 912.000us 1.76% 912.000us 76.000us 12
aten::copy_ 1.36% 706.000us 1.36% 706.000us 6.660us 106
aten::expand 0.95% 492.000us 0.98% 509.000us 4.138us 123
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 51.903ms
1) negative 0.7236
2) neutral 0.2287
3) positive 0.0477
In addition to the classification output from the model, you can now see the execution time for the different operators.
You can experiment with the BFloat16 floating-point number format and Transparent huge pages settings with PyTorch and see how that impacts the performance of your model.
You have successfully run and profiled a sentiment analysis NLP model from Hugging Face on your Arm machine. You can explore running other models and use cases just as easily.