Low-Rank Adaptation
, LoRA
, is a mathematical technique proposed by Microsoft researchers as an efficient and cheaper alternative to full fine-tuning. LoRA
freezes the pre-trained model weights and injects trainable low-rank matrices into layers of a transformer. It reduces the number of trainable parameters, hence the computational burden. Researchers found that LoRA
could fine-tune GPT-3 175B with 10,000 times fewer trainable parameters and three times fewer GPU memory requirements vs. alternative methods.
Why Low-Rank Adaptation Of Large Language Models?
Adapting large language models to specific domains, tasks, or datasets can be resource-intensive, requiring powerful GPUs and thousands, if not millions of dollars. However, most cases do not require the whole model, i.e., every parameter, to be modified. LoRA
is built on this assumption. It allows developers to control the number of parameters to train and merge Low-Rank
weights with the original one to minimize latency.
Advantages of LoRA:
LoRA
finds a subset to train to achieve similar results as the full fine-tune. If that subset is 1%, LoRA
can fine-tune a model with 100x less resources. This results in:
- Reduced training time:
LoRA
reduces the time required for fine-tuning from weeks to days, if not hours, as it requires less computing power to complete the task. - Accessibility: Less computing power also means less powerful GPUs, including consumer GPUs. They are easier and cheaper to acquire, making fine-tuning accessible for more developers.
- No additional inference time:
LoRA
merges trainable matrices with the frozen weights when deployed, resulting in no additional inference latency. - Easier Task Switching:
LoRA
simplifies task switching, as developers can easily create multiple models with different specializations and changeLoRA
weights instead of all parameters. - Cost Savings:
LoRA
minimizes the required hardware for fine-tuning and hosting independent instances for different tasks, resulting in cost savings.
LoRA
has cons, too. Since LoRA
fine-tunes the model for a specific task, data, or domain, the model has to be adjusted when something different is introduced.
Effectiveness of LoRA
Researchers noted that their evaluations had proven LoRA
offers comparable or better performance compared to the other efficient fine-tuning methods and full fine-tuning. It’s important to note that LoRA
was published in 2021. In May 2023, University of Washington researchers introduced QLoRA, a more efficient variant of LoRA
that first quantizes a model, hence Q in the name.
Every enterprise focuses on its competitive edge. Researching how to train, fine-tune, and deploy AI models requires different expertise. Hence, leading or even following the latest advances in AI is challenging for those who use language models to stay competitive. If you’re unsure how to train, fine-tune, and deploy large language models, engage with Picovoice Consulting and choose what works best for your project!
Consult an Expert