LoRA: Low-Rank Adaptation of Large Language Models
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
Abstract
We propose Low-Rank Adaptation (LoRA), which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA reduces the number of trainable parameters by 10,000x and the GPU memory requirement by 3x compared to full fine-tuning.
Key Findings
- 1Reduced trainable parameters by 10,000x while maintaining quality
- 2Decreased GPU memory requirements by 3x compared to full fine-tuning
- 3Showed that low-rank updates capture task-specific adaptations effectively
- 4Enabled fine-tuning of large models on consumer hardware
- 5Introduced a technique that is composable and switchable at inference time
Impact & Significance
LoRA democratized LLM fine-tuning by making it feasible on consumer GPUs. It became the standard method for creating custom models and spawned an entire ecosystem of LoRA-adapted models on platforms like Hugging Face and Civitai.
Related Tools
Related Papers
The Llama 3 Herd of Models
Meta AI
Qwen2 Technical Report
Alibaba Cloud / Qwen Team
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI
The Claude 3 Model Family: Opus, Sonnet, and Haiku
Anthropic