EfficiencyJune 17, 2021Microsoft

LoRA: Low-Rank Adaptation of Large Language Models

Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Abstract

We propose Low-Rank Adaptation (LoRA), which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA reduces the number of trainable parameters by 10,000x and the GPU memory requirement by 3x compared to full fine-tuning.

Key Findings

  • 1Reduced trainable parameters by 10,000x while maintaining quality
  • 2Decreased GPU memory requirements by 3x compared to full fine-tuning
  • 3Showed that low-rank updates capture task-specific adaptations effectively
  • 4Enabled fine-tuning of large models on consumer hardware
  • 5Introduced a technique that is composable and switchable at inference time

Impact & Significance

LoRA democratized LLM fine-tuning by making it feasible on consumer GPUs. It became the standard method for creating custom models and spawned an entire ecosystem of LoRA-adapted models on platforms like Hugging Face and Civitai.

Read Full Paper