Run AI on your own machine.
No cloud, no subscription, no data leaving your box. Which tool fits which scenario, how to install, how GPU/Metal acceleration works — a practical guide.
Comparison
5 popular local AI tools, one table. Click a card for details.
Easiest to start
Open
Production leader
Open
Runs everywhere
Open
GUI experience
Open
Apple Silicon native
Open
Ollama
Local LLM in one command — the easiest start
Setup ease
Performance
vLLM
High-throughput production server
Setup ease
Performance
llama.cpp
C++ inference engine that runs anywhere
Setup ease
Performance
LM Studio
Download, run, and chat with models in a GUI
Setup ease
Performance
MLX
Apple Silicon-native framework
Setup ease
Performance
Platform support
| Ollama | vLLM | llama.cpp | LM Studio | MLX | |
|---|---|---|---|---|---|
| Apple Silicon | |||||
| CPU | |||||
| NVIDIA (CUDA) | |||||
| AMD (ROCm) |
Which one should I pick?
Quick routing by scenario.
Personal use on Mac, prototyping
→ Ollama (easiest) or LM Studio (if you want a GUI)
Max performance on M-series Mac
→ MLX — Apple Silicon native, 20–40% faster than llama.cpp
Production — concurrent users, multi-GPU
→ vLLM — PagedAttention + continuous batching
Embedded / limited hardware / own binary
→ llama.cpp — C++ single binary, RPi to server
I don't want to use a terminal
→ LM Studio — click download, click run, tweak in the GUI
Sensitive data (contracts, health) must stay local
→ Ollama, llama.cpp, or MLX — fully offline capable
Run a vision/multimodal model locally (LLaVA, Qwen-VL)
→ Ollama (easiest) or vLLM (production); LM Studio GUI also works
Function calling / tool use locally
→ Ollama 0.5+ supports it natively; vLLM via the OpenAI-compatible API
Fine-tune locally (LoRA / QLoRA)
→ On Mac: MLX (mlx_lm.lora). On NVIDIA: axolotl/unsloth; llama.cpp for inference
Embed in an iOS / iPadOS app
→ MLX Swift (Apple) or llama.cpp (cross-platform mobile bindings)