AI Atlas
LOCAL AI

Run AI on your own machine.

No cloud, no subscription, no data leaving your box. Which tool fits which scenario, how to install, how GPU/Metal acceleration works — a practical guide.

Comparison

5 popular local AI tools, one table. Click a card for details.

Platform support

  Ollama vLLM llama.cpp LM Studio MLX
Apple Silicon
CPU
NVIDIA (CUDA)
AMD (ROCm)

Which one should I pick?

Quick routing by scenario.

Personal use on Mac, prototyping → Ollama (easiest) or LM Studio (if you want a GUI)
Max performance on M-series Mac → MLX — Apple Silicon native, 20–40% faster than llama.cpp
Production — concurrent users, multi-GPU → vLLM — PagedAttention + continuous batching
Embedded / limited hardware / own binary → llama.cpp — C++ single binary, RPi to server
I don't want to use a terminal → LM Studio — click download, click run, tweak in the GUI
Sensitive data (contracts, health) must stay local → Ollama, llama.cpp, or MLX — fully offline capable
Run a vision/multimodal model locally (LLaVA, Qwen-VL) → Ollama (easiest) or vLLM (production); LM Studio GUI also works
Function calling / tool use locally → Ollama 0.5+ supports it natively; vLLM via the OpenAI-compatible API
Fine-tune locally (LoRA / QLoRA) → On Mac: MLX (mlx_lm.lora). On NVIDIA: axolotl/unsloth; llama.cpp for inference
Embed in an iOS / iPadOS app → MLX Swift (Apple) or llama.cpp (cross-platform mobile bindings)