Run AI on your own machine.
No cloud, no subscription, no data leaving your box. Which tool fits which scenario, how to install, how GPU/Metal acceleration works — a practical guide.
Comparison
5 popular local AI tools, one table. Click a card for details.
Easiest to startOpenProduction leaderOpenRuns everywhereOpenGUI experienceOpenApple Silicon nativeOpen
Ollama
Local LLM in one command — the easiest start
Setup ease
Performance
vLLM
High-throughput production server
Setup ease
Performance
llama.cpp
C++ inference engine that runs anywhere
Setup ease
Performance
LM Studio
Download, run, and chat with models in a GUI
Setup ease
Performance
MLX
Apple Silicon-native framework
Setup ease
Performance
Platform support
| Ollama | vLLM | llama.cpp | LM Studio | MLX | |
|---|---|---|---|---|---|
| Apple Silicon | |||||
| CPU | |||||
| NVIDIA (CUDA) | |||||
| AMD (ROCm) |
Which one should I pick?
Quick routing by scenario.
Personal use on Mac, prototyping→ Ollama (easiest) or LM Studio (if you want a GUI)
Max performance on M-series Mac→ MLX — Apple Silicon native, fine-tune capable
Production — concurrent users, multi-GPU→ vLLM — PagedAttention + continuous batching
Embedded / limited hardware / own binary→ llama.cpp — C++ single binary, runs anywhere
I don't want to use a terminal→ LM Studio — click download, click run
Sensitive data (contracts, health) must stay local→ Ollama or llama.cpp — fully offline capable