“`html
- What is LM Studio?
- Required hardware configuration
- Step 1: Download and install LM Studio
- Step 2: Understanding the LM Studio interface
- Step 3: Choose and download a model
- Step 4: Load a model and start a conversation
- Step 5: Enable the local server (for developers)
- Analyzing documents with LM Studio
- LM Studio vs Ollama: which one to choose?
- Troubleshooting common issues
- The model generates very slowly
- The model crashes when loading
- AVX2 error at startup
- The local API doesn’t respond
- Going further with LM Studio
- Conclusion
What if you could run an artificial intelligence as powerful as ChatGPT directly on your computer, without subscription, without internet connection, and without sending any data to a third-party server? That’s exactly the promise of LM Studio, the desktop application that has become the reference in 2026 for running LLMs locally — and without a single line of command. Whether you’re a developer, researcher, writer, or simply curious, this LM Studio tutorial guides you step by step: installation on Windows, macOS, and Linux, choosing the right model for your hardware configuration, adjusting parameters, activating the OpenAI-compatible local server, and advanced tips to get the most out of your private AI.
Don’t forget to explore our directory of AI tools and LLMs!
What is LM Studio?
LM Studio is a free desktop application that allows you to download, manage, and use open-source language models (LLMs) directly on your machine. Unlike cloud solutions like ChatGPT, Claude, or Gemini, everything happens locally: models are stored on your hard drive, loaded into your RAM or VRAM, and exchanges never leave your computer.
Concretely, LM Studio plays the role of a universal manager for models in the GGUF format (the most widespread quantization format for local LLMs) and MLX (optimized for Apple Silicon chips). It connects directly to the Hugging Face catalog to allow searching and downloading models in just a few clicks, without ever touching a terminal.
The concrete advantages of LM Studio
Total confidentiality. Your data never leaves your machine. No prompt is sent to an external server, no history is stored in the cloud. It’s the ideal solution for professionals handling sensitive data (HR, legal, medical, proprietary code).
Zero subscription. Once the models are downloaded, LM Studio runs entirely offline. No message limits, no quotas, no cutoffs at 20 messages per hour.
Maximum flexibility. Dozens of open-source models are available: Llama (Meta), Qwen (Alibaba), Mistral, DeepSeek, Gemma (Google), Phi (Microsoft), and many more. You freely choose the model best suited to your task and hardware.
OpenAI-compatible local API. LM Studio exposes a local server at http://localhost:1234/v1, compatible with the OpenAI API — which allows you to integrate your local AI into any existing application.
Required hardware configuration
Before installing LM Studio, you need to make sure your machine is compatible. Good news: the requirements are much lower than you might think.
Minimum configuration
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB or more |
| Storage | 10 GB free | 50 GB free (multiple models) |
| CPU | x64 or ARM64 with AVX2 | Recent (4+ cores) |
| GPU | Not mandatory | 6 GB VRAM minimum for speed gains |
| OS | Windows 10/11, macOS 12+, Ubuntu 20.04+ | — |
Which model for which configuration?
Model choice depends directly on your available RAM and VRAM. Here’s a practical guide:
| Configuration | Recommended Models | Performance |
|---|---|---|
| CPU only, 8 GB RAM | Qwen3 4B Q4, Phi-3 mini (3.8B) | Slow but functional |
| CPU only, 16 GB RAM | Llama 3.2 8B Q4_K_M, Mistral 7B Q4 | Acceptable for regular use |
| GPU 6-8 GB VRAM | Llama 3.1 8B Q4_K_M, Qwen3 8B Q4 | Fast, ~ChatGPT 3.5 |
| GPU 12-16 GB VRAM | Qwen3 14B, Gemma 3 12B, DeepSeek-R1 14B | Very powerful |
| GPU 24 GB VRAM | Qwen3 30B, Llama 3.3 70B Q4 | GPT-4 level |
Tip: LM Studio automatically displays the amount of RAM/VRAM required for each model variant before you download it. No need to guess.
Step 1: Download and install LM Studio
On Windows
- Go to lmstudio.ai/download.
- Download the
.exefile corresponding to Windows. - Launch the installer and follow the steps (standard “Next / Next / Finish” installation).
- Open LM Studio from the Start menu or the desktop shortcut created automatically.
On macOS
- Download the
.dmgfile from lmstudio.ai/download. - Open the
.dmgand drag the LM Studio icon to the Applications folder. - Launch LM Studio from Launchpad or Spotlight (⌘ + Space, type “LM Studio”).
On Apple Silicon (M1, M2, M3, M4), LM Studio automatically uses the MLX engine for native GPU acceleration. Performance is excellent, even on entry-level MacBook Air.
On Linux (Ubuntu / Debian)
- Download the
.AppImagefile from the official website. - Make it executable:
chmod +x LMStudio-*.AppImage - Launch it:
./LMStudio-*.AppImage
LM Studio supports Ubuntu 20.04+ and compatible distributions. It automatically detects CUDA (NVIDIA) and ROCm (AMD) for GPU acceleration.
Step 2: Understanding the LM Studio interface
On first launch, LM Studio presents an interface organized around several main tabs:
- Discover: the model browser connected to Hugging Face. This is where you search and download models.
- Chat: the conversation interface, similar to ChatGPT.
- Developer (or Local Server): the OpenAI-compatible local server for developers.
- My Models: the list of your already downloaded models.
The right sidebar panel in the Chat view gives access to generation parameters: temperature, context length, top-p, repeat penalty, prompt system — everything is adjustable without restarting the model.
Step 3: Choose and download a model
This is the step that often confuses beginners: the multitude of models and variants on Hugging Face can seem intimidating. Here’s how to navigate it.
Open the model browser
Press Ctrl + Shift + M (Windows/Linux) or ⌘ + Shift + M (Mac) to open the model search. LM Studio displays a selection of models recommended by its team (“Staff Picks”) as well as recent releases.
Understanding quantization (GGUF)
The models available in LM Studio are in the GGUF format, a compression format that reduces the size of a model so it fits in memory on consumer hardware. The most common quantization levels:
| Format | Quality | Typical size (7B) | Usage |
|---|---|---|---|
| Q8_0 | Excellent (quasi-original) | ~7 GB | GPU 8+ GB VRAM |
| Q6_K | Very good | ~5.5 GB | GPU 6-8 GB VRAM |
| Q4_K_M | Good (recommended) | ~4.5 GB | GPU 4-6 GB VRAM |
| Q3_K_M | Correct | ~3.5 GB | CPU or GPU < 4 GB |
| Q2_K | Degraded | ~2.5 GB | Very limited machines |
The golden rule: choose Q4_K_M as a starting point. It’s the best quality/size compromise for virtually all uses. The quality difference between Q4_K_M and Q8_0 is virtually imperceptible for writing or code.
Recommended models to get started
- Beginner / general use:
Llama 3.2 3B Q4_K_M(very fast, ~2 GB) - Daily chat:
Mistral 7B Instruct Q4_K_M(~4.5 GB) - Reasoning / analysis:
DeepSeek-R1 8B Q4_K_M(~5 GB) - Coding:
Qwen3 Coder 8B Q4_K_M(~5 GB) - Maximum performance (16+ GB VRAM):
Qwen3 30B Q4_K_M
To launch the download, simply click the Download button next to the desired variant. LM Studio displays the progress and required disk space. Models are stored in ~/.lmstudio/models/ (macOS/Linux) or C:\Users\[your name]\.lmstudio\models\ (Windows).
Step 4: Load a model and start a conversation
Once the download is complete:
- Go to the Chat tab.
- Click on the dropdown menu at the top of the screen (it displays “Select a model”).
- Choose your model from the My Models list.
- Click Load — a progress bar appears while loading into memory (5 to 15 seconds for a 7B Q4_K_M model).
- Start typing in the input area.
Setting the system prompt
The system prompt (or “system prompt”) defines the general behavior of the AI: its role, tone, constraints. You’ll find it in the right panel under “System Prompt”. For example:
You are an expert assistant in SEO writing. You write clear, structured texts optimized for search engines.
Key parameters to know
- Temperature: controls the creativity of responses. Recommended value: 0.7 for writing, 0.2 for code or precise tasks.
- Context Length: the “memory” of the model expressed in tokens. A higher value consumes more VRAM. 4096 tokens is a good start; increase as needed.
- GPU Layers: the number of model layers loaded on the GPU. Set this slider to maximum — if the model doesn’t fit entirely in VRAM, LM Studio will automatically switch excess layers to CPU.
Step 5: Enable the local server (for developers)
This is LM Studio’s most powerful feature for technical users. The local server exposes an API compatible with OpenAI at http://localhost:1234/v1, which means any tool or script designed for GPT-4 can be redirected to your local model without modifying the code — by simply changing the base URL.
Starting the server
- Go to the Developer tab (or Local Server depending on your version).
- Select the model to use.
- Click Start Server.
- The server is active on port 1234.
Python Example
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio" # Dummy value, LM Studio doesn't require it
)
response = client.chat.completions.create(
model="lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF",
messages=[
{"role": "system", "content": "You are an expert SEO assistant."},
{"role": "user", "content": "Give me 5 article ideas about generative AI."}
],
temperature=0.7,
)
print(response.choices[0].message.content)Important: the model name in the API call must exactly match the identifier displayed in the Developer tab of LM Studio. Copy it directly from the interface to avoid errors.
Integration with other tools
LM Studio’s local server is compatible with many tools that rely on the OpenAI API:
- Continue (VS Code/JetBrains extension for assisted coding)
- Open WebUI (advanced chat interface)
- n8n / Make (workflow automation)
- Cursor (AI code editor)
- Any Python or Node.js script using the OpenAI SDK
Analyzing documents with LM Studio
LM Studio supports loading PDF, TXT, and Word files to analyze them directly in the conversation. For short documents, the model reads the entire content. For long documents, LM Studio automatically activates a RAG (Retrieval-Augmented Generation) system: it extracts only the passages relevant to your question, which prevents saturating the context window.
To load a document, use the attachment icon in the chat input bar, or drag and drop the file directly into the interface.
LM Studio vs Ollama: which one to choose?
LM Studio and Ollama are the two most popular tools for local LLMs. They don’t quite address the same profile.
| Criterion | LM Studio | Ollama |
|---|---|---|
| Graphical interface | ✅ Full interface | ❌ Terminal only |
| Ease of installation | ✅ Standard installer | ✅ Single command |
| Model browser | ✅ Built-in (Hugging Face) | ❌ ollama pull command |
| Local API server | ✅ Port 1234 | ✅ Port 11434 |
| OpenAI compatibility | ✅ Yes | ✅ Yes |
| Usage without terminal | ✅ Ideal | ❌ Difficult |
| Automation / scripting | ⚠️ Possible via CLI lms | ✅ Native |
| Lightweight / services | ⚠️ Heavy application | ✅ Light background service |
In summary: LM Studio is the natural choice for beginners and non-developer profiles who want a smooth and visual experience. Ollama is preferred by developers who want to script, automate, and integrate LLMs into their pipelines. Many advanced users use both: LM Studio to explore and test models, Ollama for production integrations.
Troubleshooting common issues
The model generates very slowly
This is the most common problem among new users. The cause is almost always the same: the model doesn’t fit entirely in VRAM and layers are executed on the CPU, which is much slower for this type of computation. Solutions: reduce model size (switching from 7B to 3B), choose a lighter quantization (Q4_K_M instead of Q8_0), or decrease Context Length in the parameters.
The model crashes when loading
Verify that the downloaded GGUF file isn’t corrupted (LM Studio may sometimes display a checksum error). Delete the model from the interface and download it again. If the problem persists, check that your GPU driver is up to date (CUDA for NVIDIA, ROCm for AMD).
AVX2 error at startup
Your processor doesn’t support AVX2 instructions, required by LM Studio. This is mainly the case on very old machines (before 2013). LM Studio cannot run on these configurations.
The local API doesn’t respond
Make sure a model is properly loaded and active before starting the server. The server cannot run without a model in memory. Also verify that port 1234 is not being used by another application.
Going further with LM Studio
LM Studio has offered since 2025-2026 several advanced features that make it a true AI working environment:
LM Studio CLI (lms). A command-line interface for users who want to script: lms get <model> to download a model, lms infer to launch inference directly from the terminal, lms ls to list installed models.
MCP support (Model Context Protocol). LM Studio can now function as an MCP client, allowing it to use external tools (web access, file system, databases) during conversation — like the “tools” in the OpenAI API.
Llmster. A no-GUI mode that allows deploying LM Studio on Linux servers or in CI/CD environments, without needing a display.
LM Studio Hub. A space for sharing configurations and presets among users.
Conclusion
LM Studio is today the most accessible tool for anyone who wants to run AI locally, without complex installation, without subscription, and without sacrificing privacy. In just a few clicks, you have access to open-source models capable of writing, coding, analyzing documents, and answering complex questions — all from your own machine.
To get started, the simplest path remains: download LM Studio, choose a Llama 3.2 3B or Mistral 7B model in Q4_K_M depending on your available RAM, and launch your first conversation. Once comfortable, the OpenAI-compatible local server opens the door to much more powerful integrations.
To learn more on this topic
- Uncensored AI in 2026: the complete guide to the best web and local models
- Venice.ai: uncensored AI that protects your privacy — complete review 2026
- LLMs explained: essential AI models in 2026
- Our directory of AI tools
“`
