Low-Rank Adaptation. An efficient fine-tuning method that freezes the original model weights and only trains small adapter matrices. Drastically reduces the compute and memory needed for fine-tuning — you can customize a 70B model on a single GPU. QLoRA adds quantization for even more savings.
LoRA (Low-Rank Adaptation) is a technique that makes fine-tuning large models dramatically cheaper and faster. Instead of updating all billions of parameters during fine-tuning, LoRA freezes the original model and injects small trainable matrices into each layer. These adapters capture task-specific knowledge with a tiny fraction of the parameters — often less than 1% of the original model size.
The math behind LoRA is based on the observation that weight updates during fine-tuning tend to have low rank — meaning they can be decomposed into two small matrices multiplied together. Instead of a huge weight update matrix, you learn two thin matrices whose product approximates the full update. A 7-billion parameter model might need only 10-50 million trainable parameters with LoRA.
The practical impact has been massive. Before LoRA, fine-tuning a large model required multiple expensive GPUs and significant memory. With LoRA, you can fine-tune a 7B model on a single consumer GPU with 24GB of VRAM. You can also swap LoRA adapters without reloading the base model — so one server can serve multiple specialized versions. QLoRA (quantized LoRA) pushes this further by combining LoRA with 4-bit quantization, making fine-tuning possible on even more modest hardware.
"We trained a LoRA adapter for our medical chatbot — only 50MB on top of the base model, but it dramatically improved accuracy on clinical terminology."
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
Reducing the precision of a model's numerical values — for example, from 32-bit to 4-bit numbers.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve goals.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.