ExecuTorch is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
LLaMA is a state-of-the-art foundational large language model designed to enable researchers to advance their work in this subfield of AI.
Dynamic quantization refers to quantizing activations dynamically, such that quantization parameters for activations are calculated, from min/max range, at runtime.