AI Atlas
Local AI
GUI experience

LM Studio

Download, run, and chat with models in a GUI

Overview

LM Studio is a desktop app that brings local AI to you entirely through a GUI. Browse Hugging Face inside the app, download a model, and run it with one click. No terminal, no Python, no config files.

Under the hood it runs llama.cpp and MLX engines. Provides both a chat UI and an OpenAI-compatible local server — so it's callable from code too. Free, but not open source.

Installation

# 1. https://lmstudio.ai/download adresinden .dmg
# 2. Applications klasörüne sürükle
# 3. Aç

# CLI'ı opsiyonel kur (macOS)
~/.lmstudio/bin/lms bootstrap
# AppImage indir
# https://lmstudio.ai/download

chmod +x LM-Studio-*.AppImage
./LM-Studio-*.AppImage

# CLI
~/.lmstudio/bin/lms bootstrap
# .exe installer
# https://lmstudio.ai/download

# CLI (PowerShell)
& "$env:USERPROFILE\.lmstudio\bin\lms.exe" bootstrap

Configuration

Most settings live in the GUI: search → download → 'Load' → chat. The power is in the per-model config panel: Context Length, GPU Offload, Temperature, Top-P, Repeat Penalty — all as sliders.

Load a model from CLI (lms) bash
# Yüklü modelleri listele
lms ls

# Model indir
lms get llama-3.1-8b-instruct

# Sunucuyu başlat (default port 1234)
lms server start

# Kullanılan modeli değiştir
lms load llama-3.1-8b-instruct
lms unload --all

# Status
lms status
Key GUI parameters bash
Context Length        # 2048 → 8192 → 32K (model destekliyorsa)
GPU Offload (layers)  # Auto → 100% (Apple Silicon'da otomatik)
CPU Threads           # CPU çekirdek sayısının yarısı genelde optimal
Eval Batch Size       # Prompt processing batch (512 default)
Temperature, Top-P    # Sampling
Repeat Penalty        # 1.1 default, tekrarı kısar
mlock                 # Modeli RAM'de sabitle (büyük modeller için)
Flash Attention       # CUDA'da hızlandırma

Hardware acceleration

Two engines: llama.cpp (GGUF) everywhere; MLX (Apple Silicon) only on M-series Macs. LM Studio detects the hardware and picks the engine for you.

Apple Silicon Metal (llama.cpp) + MLX engine option
NVIDIA GPU CUDA — Flash Attention, full offload
AMD GPU Vulkan (Win/Linux), no ROCm yet
Intel Mac CPU only
CPU only
Multi-GPU Limited, llama.cpp split support

Model formats & quantization

GGUF (llama.cpp) and MLX formats. Browse Hugging Face from inside the app, click download. Quantization variants come with friendly labels ('Best for your hardware' badge).

GGUF All platforms (llama.cpp engine)
MLX Apple Silicon only, MLX engine
Quant picker GUI shows 'Recommended' badge for the right choice
Vision models LLaVA, Qwen-VL — multimodal chat
Embedding models Yes (Nomic, BGE), /v1/embeddings endpoint

API

OpenAI-compatible (Python) python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
)

resp = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Selam!"}],
)
print(resp.choices[0].message.content)
Embedding (Python) python
emb = client.embeddings.create(
    model="nomic-embed-text-v1.5",
    input="Bir cümle embed et",
)
print(len(emb.data[0].embedding))  # 768

Performance

Single-stream (M2 Max, MLX 8B Q4) ~60 tok/s
Single-stream (M2 Max, GGUF 8B Q4) ~50 tok/s
Concurrent requests Very low — single-user focused
Cold load 10-60s

Common pitfalls

  • Not open source The app is closed-source. For production embedding, your own binary, or in-house distribution, prefer Ollama or llama.cpp. Free for personal use.
  • Single-user product Server mode supports concurrent requests but isn't built for high throughput. For production use vLLM.
  • Disk fills up fast Easy to download 10 models from the GUI; each is 4-40 GB. Move Settings → Models folder to an external drive, delete unused models.
  • Engine confusion on Apple The same model can come as both GGUF and MLX. MLX is usually 20-30% faster but not all models exist in MLX. Watch the badges.

Resources