“`html
- What is LM Studio?
- Required hardware configuration
- Step 1: Download and install LM Studio
- Step 2: Understanding the LM Studio interface
- Step 3: Choose and download a model
- Step 4: Load a model and start a conversation
- Step 5: Activate the local server (for developers)
- Analyzing documents with LM Studio
- LM Studio vs Ollama: Which should you choose?
- Troubleshooting common issues
- The model is generating very slowly
- The model crashes on loading
- AVX2 error at startup
- Local API is not responding
- Going further with LM Studio
- Conclusion
What if you could run an artificial intelligence as powerful as ChatGPT directly on your computer, without a subscription, without internet connection, and without sending any data to a third-party server? That’s exactly the promise of LM Studio, the desktop application that has become the reference in 2026 for running LLMs locally — and without any command line. Whether you’re a developer, researcher, writer, or simply curious, this LM Studio tutorial guides you step by step: installation on Windows, macOS, and Linux, choosing the right model according to your hardware configuration, parameter tuning, activating the local OpenAI-compatible server, and advanced tips to get the best out of your private AI.
Don’t forget to explore our directory of AI tools and LLMs!
What is LM Studio?
LM Studio is a free desktop application that lets you download, manage, and use open source language models (LLMs) directly on your machine. Unlike cloud solutions like ChatGPT, Claude, or Gemini, everything happens locally: models are stored on your hard drive, loaded into your RAM or VRAM, and exchanges never leave your computer.
Concretely, LM Studio acts as a universal manager for models in GGUF format (the most widely used quantization format for local LLMs) and MLX (optimized for Apple Silicon chips). It connects directly to the Hugging Face catalog to allow you to search and download models in just a few clicks, without ever touching a terminal.
The concrete advantages of LM Studio
Complete privacy. Your data never leaves your machine. No prompt is sent to an external server, no history is stored in the cloud. It’s the ideal solution for professionals handling sensitive data (HR, legal, medical, proprietary code).
Zero subscription. Once models are downloaded, LM Studio works entirely offline. No message limit, no quota, no cutoff at 20 messages per hour.
Maximum flexibility. Dozens of open source models are available: Llama (Meta), Qwen (Alibaba), Mistral, DeepSeek, Gemma (Google), Phi (Microsoft), and many others. You freely choose the model best suited to your task and hardware.
Local API compatible with OpenAI. LM Studio exposes a local server at http://localhost:1234/v1, compatible with the OpenAI API — which allows you to integrate your local AI into any existing application.
Required hardware configuration
Before installing LM Studio, you need to make sure your machine is compatible. Good news: the requirements are far lower than you might think.
Minimum configuration
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB or more |
| Storage | 10 GB free | 50 GB free (multiple models) |
| CPU | x64 or ARM64 with AVX2 | Recent (4+ cores) |
| GPU | Not mandatory | 6 GB VRAM minimum for speed gains |
| OS | Windows 10/11, macOS 12+, Ubuntu 20.04+ | — |
Which model for which configuration?
The choice of model depends directly on your available RAM and VRAM. Here’s a practical guide:
| Configuration | Recommended models | Performance |
|---|---|---|
| CPU only, 8 GB RAM | Qwen3 4B Q4, Phi-3 mini (3.8B) | Slow but functional |
| CPU only, 16 GB RAM | Llama 3.2 8B Q4_K_M, Mistral 7B Q4 | Acceptable for regular use |
| GPU 6-8 GB VRAM | Llama 3.1 8B Q4_K_M, Qwen3 8B Q4 | Fast, ~ChatGPT 3.5 |
| GPU 12-16 GB VRAM | Qwen3 14B, Gemma 3 12B, DeepSeek-R1 14B | Very performant |
| GPU 24 GB VRAM | Qwen3 30B, Llama 3.3 70B Q4 | GPT-4 level |
Tip: LM Studio automatically displays the amount of RAM/VRAM needed for each variant of a model before you download it. No need to guess.
Step 1: Download and install LM Studio
On Windows
- Go to lmstudio.ai/download.
- Download the
.exefile corresponding to Windows. - Launch the installer and follow the steps (standard installation “Next / Next / Finish”).
- Open LM Studio from the Start menu or the desktop shortcut created automatically.
On macOS
- Download the
.dmgfile from lmstudio.ai/download. - Open the
.dmgand drag the LM Studio icon to the Applications folder. - Launch LM Studio from Launchpad or Spotlight (⌘ + Space, type “LM Studio”).
On Apple Silicon (M1, M2, M3, M4), LM Studio automatically uses the MLX engine for native GPU acceleration. Performance is excellent, even on entry-level MacBook Airs.
On Linux (Ubuntu / Debian)
- Download the
.AppImagefile from the official website. - Make it executable:
chmod +x LMStudio-*.AppImage - Launch it:
./LMStudio-*.AppImage
LM Studio supports Ubuntu 20.04+ and compatible distributions. It automatically detects CUDA (NVIDIA) and ROCm (AMD) for GPU acceleration.
Step 2: Understanding the LM Studio interface
On first launch, LM Studio presents an interface organized around several main tabs:
- Discover: the model browser connected to Hugging Face. This is where you search and download models.
- Chat: the conversation interface, similar to ChatGPT.
- Developer (or Local Server): the local OpenAI-compatible server for developers.
- My Models: the list of your already downloaded models.
The right sidebar panel in the Chat view provides access to generation parameters: temperature, context length, top-p, repeat penalty, prompt system — everything is adjustable without needing to restart the model.
Step 3: Choose and download a model
This is the step that often confuses beginners: the multitude of models and variants on Hugging Face can seem intimidating. Here’s how to make sense of it.
Open the model browser
Press Ctrl + Shift + M (Windows/Linux) or ⌘ + Shift + M (Mac) to open the model search. LM Studio displays a selection of models recommended by its team (“Staff Picks”) as well as recent releases.
Understanding quantization (GGUF)
The models available in LM Studio are in GGUF format, a compression format that allows you to reduce a model’s size so it fits in memory on consumer hardware. The most common quantization levels:
| Format | Quality | Typical size (7B) | Usage |
|---|---|---|---|
| Q8_0 | Excellent (nearly original) | ~7 GB | GPU 8+ GB VRAM |
| Q6_K | Very good | ~5.5 GB | GPU 6-8 GB VRAM |
| Q4_K_M | Good (recommended) | ~4.5 GB | GPU 4-6 GB VRAM |
| Q3_K_M | Decent | ~3.5 GB | CPU or GPU < 4 GB |
| Q2_K | Degraded | ~2.5 GB | Very limited machines |
The golden rule: choose Q4_K_M as your starting point. It’s the best quality/size compromise for virtually all uses. The difference in quality between Q4_K_M and Q8_0 is almost imperceptible for writing or coding.
Recommended models to get started
- Beginner / general use:
Llama 3.2 3B Q4_K_M(very fast, ~2 GB) - Daily chat:
Mistral 7B Instruct Q4_K_M(~4.5 GB) - Reasoning / analysis:
DeepSeek-R1 8B Q4_K_M(~5 GB) - Coding:
Qwen3 Coder 8B Q4_K_M(~5 GB) - Maximum performance (16+ GB VRAM):
Qwen3 30B Q4_K_M
To start the download, simply click the Download button next to the desired variant. LM Studio displays the progress and required disk space. Models are stored in ~/.lmstudio/models/ (macOS/Linux) or C:\Users\[your username]\.lmstudio\models\ (Windows).
Step 4: Load a model and start a conversation
Once the download is complete:
- Go to the Chat tab.
- Click on the dropdown menu at the top of the screen (it displays “Select a model”).
- Choose your model from the My Models list.
- Click Load — a progress bar appears while loading into memory (5 to 15 seconds for a 7B Q4_K_M model).
- Start typing in the input area.
Setting the system prompt
The system prompt defines the AI’s general behavior: its role, tone, constraints. You’ll find it in the right panel under “System Prompt”. For example:
You are an expert assistant in SEO writing. You write clear, well-structured texts optimized for search engines.
Key parameters to know
- Temperature: controls the creativity of responses. Recommended value: 0.7 for writing, 0.2 for code or precise tasks.
- Context Length: the model’s “memory” expressed in tokens. A higher value consumes more VRAM. 4096 tokens is a good starting point; increase as needed.
- GPU Layers: the number of model layers loaded on the GPU. Set this slider to maximum — if the model doesn’t fit entirely in VRAM, LM Studio will automatically move excess layers to CPU.
Step 5: Activate the local server (for developers)
This is the most powerful feature of LM Studio for technical profiles. The local server exposes an OpenAI-compatible API at http://localhost:1234/v1, which means any tool or script designed for GPT-4 can be redirected to your local model without modifying code — just by changing the base URL.
Starting the server
- Go to the Developer tab (or Local Server depending on your version).
- Select the model to use.
- Click Start Server.
- The server is active on port 1234.
Python example
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio" # Fictitious value, LM Studio doesn't require one
)
response = client.chat.completions.create(
model="lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF",
messages=[
{"role": "system", "content": "You are an expert SEO assistant."},
{"role": "user", "content": "Give me 5 article ideas on generative AI."}
],
temperature=0.7,
)
print(response.choices[0].message.content)Important: the model name in the API call must exactly match the identifier displayed in the Developer tab of LM Studio. Copy it directly from the interface to avoid errors.
Integration with other tools
LM Studio’s local server is compatible with many tools that rely on the OpenAI API:
- Continue (VS Code/JetBrains extension for assisted coding)
- Open WebUI (advanced chat interface)
- n8n / Make (workflow automation)
- Cursor (AI code editor)
- Any Python or Node.js script using the OpenAI SDK
Analyzing documents with LM Studio
LM Studio supports loading PDF, TXT, and Word files to analyze them directly in the conversation. For short documents, the model reads the entire content. For long documents, LM Studio automatically activates a RAG system (Retrieval-Augmented Generation): it extracts only passages relevant to your question, which prevents overwhelming the context window.
To load a document, use the attachment icon in the chat input bar, or drag and drop the file directly into the interface.
LM Studio vs Ollama: Which should you choose?
LM Studio and Ollama are the two most popular tools for local LLMs. They don’t quite address the same profile.
| Criterion | LM Studio | Ollama |
|---|---|---|
| Graphical interface | ✅ Complete interface | ❌ Terminal only |
| Ease of installation | ✅ Standard installer | ✅ Single command |
| Model browser | ✅ Built-in (Hugging Face) | ❌ ollama pull command |
| Local API server | ✅ Port 1234 | ✅ Port 11434 |
| OpenAI compatibility | ✅ Yes | ✅ Yes |
| Terminal-free usage | ✅ Ideal | ❌ Difficult |
| Automation / scripting | ⚠️ Possible via CLI lms | ✅ Native |
| Lightness / services | ⚠️ Heavy application | ✅ Lightweight background service |
In summary: LM Studio is the natural choice for beginners and non-developer profiles who want a smooth and visual experience. Ollama is preferred by developers who want to script, automate, and integrate LLMs into their pipelines. Many advanced users use both: LM Studio for exploring and testing models, Ollama for production integrations.
Troubleshooting common issues
The model is generating very slowly
This is the most common problem for new users. The cause is almost always the same: the model doesn’t fit entirely in VRAM and layers are running on the CPU, which is much slower for this type of computation. Solutions: reduce the model size (go from 7B to 3B), choose a lighter quantization (Q4_K_M instead of Q8_0), or decrease the Context Length in the parameters.
The model crashes on loading
Check that the downloaded GGUF file isn’t corrupted (LM Studio sometimes displays a checksum error). Delete the model from the interface and re-download it. If the problem persists, check that your GPU driver is up to date (CUDA for NVIDIA, ROCm for AMD).
AVX2 error at startup
Your processor doesn’t support the AVX2 instructions required by LM Studio. This is mainly the case on very old machines (before 2013). LM Studio cannot run on these configurations.
Local API is not responding
Make sure a model is properly loaded and active before starting the server. The server cannot run without a model in memory. Also check that port 1234 is not being used by another application.
Going further with LM Studio
LM Studio has offered several advanced features since 2025-2026 that make it a true AI work environment:
LM Studio CLI (lms). A command-line interface for users who want to script: lms get <model> to download a model, lms infer to run inference directly from the terminal, lms ls to list installed models.
MCP support (Model Context Protocol). LM Studio can now function as an MCP client, which allows it to use external tools (web access, file system, databases) during conversation — similar to “tools” in the OpenAI API.
Llmster. A no-GUI mode that allows you to deploy LM Studio on Linux servers or in CI/CD environments, without needing a screen.
LM Studio Hub. A space for sharing configurations and presets between users.
Conclusion
LM Studio is today the most accessible tool for anyone who wants to run AI locally, without complex installation, without subscription, and without sacrificing privacy. In just a few clicks, you have access to open source models capable of writing, coding, analyzing documents, and answering complex questions — all from your own machine.
To get started, the simplest path remains: download LM Studio, choose a Llama 3.2 3B or Mistral 7B in Q4_K_M model depending on your available RAM, and start your first conversation. Once comfortable, the local OpenAI-compatible server opens the door to much more powerful integrations.
To go further on this topic
- Uncensored AI in 2026: the complete guide to the best web and local models
- Venice.ai: the uncensored AI that protects your privacy — full review 2026
- LLMs explained: essential AI models in 2026
- Our directory of AI tools
“`
