EfficiencyJune 17, 2021Microsoft

LoRA: Low-Rank Adaptation of Large Language Models

Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Abstract

We propose Low-Rank Adaptation (LoRA), which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA reduces the number of trainable parameters by 10,000x and the GPU memory requirement by 3x compared to full fine-tuning.

Key Findings

1Reduced trainable parameters by 10,000x while maintaining quality
2Decreased GPU memory requirements by 3x compared to full fine-tuning
3Showed that low-rank updates capture task-specific adaptations effectively
4Enabled fine-tuning of large models on consumer hardware
5Introduced a technique that is composable and switchable at inference time

Impact & Significance

LoRA democratized LLM fine-tuning by making it feasible on consumer GPUs. It became the standard method for creating custom models and spawned an entire ecosystem of LoRA-adapted models on platforms like Hugging Face and Civitai.

Related Tools

Hugging Face Ollama Llama

Read Full Paper

LoRA: Low-Rank Adaptation of Large Language Models

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku