AI Dictionary
Advanced· ~2 min read#lora#peft#fine-tuning

LoRA

Low-Rank Adaptation

A parameter-efficient fine-tuning method that freezes the big model and trains only small adapter matrices alongside it.

LOW-RANK ADAPTATIONFROZEN BASEW7B params · untouched+LoRA ADAPTERA×B~0.5% trainable paramstrain just two tiny matrices instead of all 7 billion weights
Definition

Fine-tuning a 7B-parameter LLM normally means updating every weight — that's high-end GPUs, gigabytes of VRAM, and weeks of compute. LoRA (2021, Microsoft) cuts that cost roughly 100×.

The idea: freeze the base model (its weights don't change). Add two small matrices to each layer: A (n × r) and B (r × m), with r small (usually 8–64). Their product represents the delta to add to the base weights. Only A and B are trained.

Result: on a 7B model, only ~5M parameters get trained (~0.07%). One consumer GPU (24GB VRAM) is enough. The output adapter is ~50MB and snaps onto the base model at runtime.

Analogy

Instead of rewriting a book, you stick post-it notes in the margins. The book (base model) stays untouched. The post-its (LoRA adapter) add commentary. Make different post-it sets for different scenarios and swap them in and out of the same book.

Real-world example

You want to specialize Llama-3-8B for Turkish legal documents. Classic fine-tuning: 8× A100, $5000+, 3 days. With LoRA: one RTX 4090, $20 of electricity, 4 hours. Resulting adapter: 80MB.

Next day a different customer needs medical terminology. Same base model, new LoRA. In production you swap between base + lora_legal and base + lora_medical on the fly. Storing many variants is cheap.

When to use
  • Fine-tuning open-source models on a limited GPU budget
  • Multiple specialized variants on the same base model
  • Fast iteration: try 5 different adapters in a day
  • Edge deployment — small adapter files are easy to distribute
When not to use
  • When you need to change fundamental model capabilities (e.g. add a new language) — full fine-tune required
  • Closed-source models (GPT, Claude) — LoRA needs weight access
  • On very small models — the overhead isn't worth the savings
Common pitfalls

Misjudging rank

r = 4 is fast but underfits. r = 256 is slow and overfits. Typical sweet spot is r = 8–32. Tune to task complexity.

Naively merging multiple LoRAs

Just adding two LoRAs (legal + medical) creates conflicts. Merge techniques like DARE-TIES exist, but the area is still active research.

Base model coupling

A LoRA adapter only works on the exact base it was trained on. Going Llama-3.1 → 3.2 means re-training.