LM Studio: Complete Guide to Installing and Using Local AI (2026)

17 minutes de lecture

“`html

What if you could run an artificial intelligence as powerful as ChatGPT directly on your computer, without subscription, without internet connection, and without sending any data to a third-party server? That’s exactly the promise of LM Studio, the desktop application that has become the reference in 2026 for running LLMs locally — and without a single line of command. Whether you’re a developer, researcher, writer, or simply curious, this LM Studio tutorial guides you step by step: installation on Windows, macOS, and Linux, choosing the right model for your hardware configuration, adjusting parameters, activating the OpenAI-compatible local server, and advanced tips to get the most out of your private AI.

Don’t forget to explore our directory of AI tools and LLMs!


What is LM Studio?

LM Studio is a free desktop application that allows you to download, manage, and use open-source language models (LLMs) directly on your machine. Unlike cloud solutions like ChatGPT, Claude, or Gemini, everything happens locally: models are stored on your hard drive, loaded into your RAM or VRAM, and exchanges never leave your computer.

Concretely, LM Studio plays the role of a universal manager for models in the GGUF format (the most widespread quantization format for local LLMs) and MLX (optimized for Apple Silicon chips). It connects directly to the Hugging Face catalog to allow searching and downloading models in just a few clicks, without ever touching a terminal.

The concrete advantages of LM Studio

Total confidentiality. Your data never leaves your machine. No prompt is sent to an external server, no history is stored in the cloud. It’s the ideal solution for professionals handling sensitive data (HR, legal, medical, proprietary code).

Zero subscription. Once the models are downloaded, LM Studio runs entirely offline. No message limits, no quotas, no cutoffs at 20 messages per hour.

Maximum flexibility. Dozens of open-source models are available: Llama (Meta), Qwen (Alibaba), Mistral, DeepSeek, Gemma (Google), Phi (Microsoft), and many more. You freely choose the model best suited to your task and hardware.

OpenAI-compatible local API. LM Studio exposes a local server at http://localhost:1234/v1, compatible with the OpenAI API — which allows you to integrate your local AI into any existing application.


Required hardware configuration

Before installing LM Studio, you need to make sure your machine is compatible. Good news: the requirements are much lower than you might think.

Minimum configuration

ComponentMinimumRecommended
RAM8 GB16 GB or more
Storage10 GB free50 GB free (multiple models)
CPUx64 or ARM64 with AVX2Recent (4+ cores)
GPUNot mandatory6 GB VRAM minimum for speed gains
OSWindows 10/11, macOS 12+, Ubuntu 20.04+

Which model for which configuration?

Model choice depends directly on your available RAM and VRAM. Here’s a practical guide:

ConfigurationRecommended ModelsPerformance
CPU only, 8 GB RAMQwen3 4B Q4, Phi-3 mini (3.8B)Slow but functional
CPU only, 16 GB RAMLlama 3.2 8B Q4_K_M, Mistral 7B Q4Acceptable for regular use
GPU 6-8 GB VRAMLlama 3.1 8B Q4_K_M, Qwen3 8B Q4Fast, ~ChatGPT 3.5
GPU 12-16 GB VRAMQwen3 14B, Gemma 3 12B, DeepSeek-R1 14BVery powerful
GPU 24 GB VRAMQwen3 30B, Llama 3.3 70B Q4GPT-4 level

Tip: LM Studio automatically displays the amount of RAM/VRAM required for each model variant before you download it. No need to guess.


Step 1: Download and install LM Studio

On Windows

  1. Go to lmstudio.ai/download.
  2. Download the .exe file corresponding to Windows.
  3. Launch the installer and follow the steps (standard “Next / Next / Finish” installation).
  4. Open LM Studio from the Start menu or the desktop shortcut created automatically.

On macOS

  1. Download the .dmg file from lmstudio.ai/download.
  2. Open the .dmg and drag the LM Studio icon to the Applications folder.
  3. Launch LM Studio from Launchpad or Spotlight (⌘ + Space, type “LM Studio”).

On Apple Silicon (M1, M2, M3, M4), LM Studio automatically uses the MLX engine for native GPU acceleration. Performance is excellent, even on entry-level MacBook Air.

On Linux (Ubuntu / Debian)

  1. Download the .AppImage file from the official website.
  2. Make it executable: chmod +x LMStudio-*.AppImage
  3. Launch it: ./LMStudio-*.AppImage

LM Studio supports Ubuntu 20.04+ and compatible distributions. It automatically detects CUDA (NVIDIA) and ROCm (AMD) for GPU acceleration.


Step 2: Understanding the LM Studio interface

On first launch, LM Studio presents an interface organized around several main tabs:

  • Discover: the model browser connected to Hugging Face. This is where you search and download models.
  • Chat: the conversation interface, similar to ChatGPT.
  • Developer (or Local Server): the OpenAI-compatible local server for developers.
  • My Models: the list of your already downloaded models.

The right sidebar panel in the Chat view gives access to generation parameters: temperature, context length, top-p, repeat penalty, prompt system — everything is adjustable without restarting the model.


Step 3: Choose and download a model

This is the step that often confuses beginners: the multitude of models and variants on Hugging Face can seem intimidating. Here’s how to navigate it.

Open the model browser

Press Ctrl + Shift + M (Windows/Linux) or ⌘ + Shift + M (Mac) to open the model search. LM Studio displays a selection of models recommended by its team (“Staff Picks”) as well as recent releases.

Understanding quantization (GGUF)

The models available in LM Studio are in the GGUF format, a compression format that reduces the size of a model so it fits in memory on consumer hardware. The most common quantization levels:

FormatQualityTypical size (7B)Usage
Q8_0Excellent (quasi-original)~7 GBGPU 8+ GB VRAM
Q6_KVery good~5.5 GBGPU 6-8 GB VRAM
Q4_K_MGood (recommended)~4.5 GBGPU 4-6 GB VRAM
Q3_K_MCorrect~3.5 GBCPU or GPU < 4 GB
Q2_KDegraded~2.5 GBVery limited machines

The golden rule: choose Q4_K_M as a starting point. It’s the best quality/size compromise for virtually all uses. The quality difference between Q4_K_M and Q8_0 is virtually imperceptible for writing or code.

  • Beginner / general use: Llama 3.2 3B Q4_K_M (very fast, ~2 GB)
  • Daily chat: Mistral 7B Instruct Q4_K_M (~4.5 GB)
  • Reasoning / analysis: DeepSeek-R1 8B Q4_K_M (~5 GB)
  • Coding: Qwen3 Coder 8B Q4_K_M (~5 GB)
  • Maximum performance (16+ GB VRAM): Qwen3 30B Q4_K_M

To launch the download, simply click the Download button next to the desired variant. LM Studio displays the progress and required disk space. Models are stored in ~/.lmstudio/models/ (macOS/Linux) or C:\Users\[your name]\.lmstudio\models\ (Windows).


Step 4: Load a model and start a conversation

Once the download is complete:

  1. Go to the Chat tab.
  2. Click on the dropdown menu at the top of the screen (it displays “Select a model”).
  3. Choose your model from the My Models list.
  4. Click Load — a progress bar appears while loading into memory (5 to 15 seconds for a 7B Q4_K_M model).
  5. Start typing in the input area.

Setting the system prompt

The system prompt (or “system prompt”) defines the general behavior of the AI: its role, tone, constraints. You’ll find it in the right panel under “System Prompt”. For example:

You are an expert assistant in SEO writing. You write clear, structured texts optimized for search engines.

Key parameters to know

  • Temperature: controls the creativity of responses. Recommended value: 0.7 for writing, 0.2 for code or precise tasks.
  • Context Length: the “memory” of the model expressed in tokens. A higher value consumes more VRAM. 4096 tokens is a good start; increase as needed.
  • GPU Layers: the number of model layers loaded on the GPU. Set this slider to maximum — if the model doesn’t fit entirely in VRAM, LM Studio will automatically switch excess layers to CPU.

Step 5: Enable the local server (for developers)

This is LM Studio’s most powerful feature for technical users. The local server exposes an API compatible with OpenAI at http://localhost:1234/v1, which means any tool or script designed for GPT-4 can be redirected to your local model without modifying the code — by simply changing the base URL.

Starting the server

  1. Go to the Developer tab (or Local Server depending on your version).
  2. Select the model to use.
  3. Click Start Server.
  4. The server is active on port 1234.

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"  # Dummy value, LM Studio doesn't require it
)

response = client.chat.completions.create(
    model="lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF",
    messages=[
        {"role": "system", "content": "You are an expert SEO assistant."},
        {"role": "user", "content": "Give me 5 article ideas about generative AI."}
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)

Important: the model name in the API call must exactly match the identifier displayed in the Developer tab of LM Studio. Copy it directly from the interface to avoid errors.

Integration with other tools

LM Studio’s local server is compatible with many tools that rely on the OpenAI API:

  • Continue (VS Code/JetBrains extension for assisted coding)
  • Open WebUI (advanced chat interface)
  • n8n / Make (workflow automation)
  • Cursor (AI code editor)
  • Any Python or Node.js script using the OpenAI SDK

Analyzing documents with LM Studio

LM Studio supports loading PDF, TXT, and Word files to analyze them directly in the conversation. For short documents, the model reads the entire content. For long documents, LM Studio automatically activates a RAG (Retrieval-Augmented Generation) system: it extracts only the passages relevant to your question, which prevents saturating the context window.

To load a document, use the attachment icon in the chat input bar, or drag and drop the file directly into the interface.


LM Studio vs Ollama: which one to choose?

LM Studio and Ollama are the two most popular tools for local LLMs. They don’t quite address the same profile.

CriterionLM StudioOllama
Graphical interface✅ Full interface❌ Terminal only
Ease of installation✅ Standard installer✅ Single command
Model browser✅ Built-in (Hugging Face)ollama pull command
Local API server✅ Port 1234✅ Port 11434
OpenAI compatibility✅ Yes✅ Yes
Usage without terminal✅ Ideal❌ Difficult
Automation / scripting⚠️ Possible via CLI lms✅ Native
Lightweight / services⚠️ Heavy application✅ Light background service

In summary: LM Studio is the natural choice for beginners and non-developer profiles who want a smooth and visual experience. Ollama is preferred by developers who want to script, automate, and integrate LLMs into their pipelines. Many advanced users use both: LM Studio to explore and test models, Ollama for production integrations.


Troubleshooting common issues

The model generates very slowly

This is the most common problem among new users. The cause is almost always the same: the model doesn’t fit entirely in VRAM and layers are executed on the CPU, which is much slower for this type of computation. Solutions: reduce model size (switching from 7B to 3B), choose a lighter quantization (Q4_K_M instead of Q8_0), or decrease Context Length in the parameters.

The model crashes when loading

Verify that the downloaded GGUF file isn’t corrupted (LM Studio may sometimes display a checksum error). Delete the model from the interface and download it again. If the problem persists, check that your GPU driver is up to date (CUDA for NVIDIA, ROCm for AMD).

AVX2 error at startup

Your processor doesn’t support AVX2 instructions, required by LM Studio. This is mainly the case on very old machines (before 2013). LM Studio cannot run on these configurations.

The local API doesn’t respond

Make sure a model is properly loaded and active before starting the server. The server cannot run without a model in memory. Also verify that port 1234 is not being used by another application.


Going further with LM Studio

LM Studio has offered since 2025-2026 several advanced features that make it a true AI working environment:

LM Studio CLI (lms). A command-line interface for users who want to script: lms get <model> to download a model, lms infer to launch inference directly from the terminal, lms ls to list installed models.

MCP support (Model Context Protocol). LM Studio can now function as an MCP client, allowing it to use external tools (web access, file system, databases) during conversation — like the “tools” in the OpenAI API.

Llmster. A no-GUI mode that allows deploying LM Studio on Linux servers or in CI/CD environments, without needing a display.

LM Studio Hub. A space for sharing configurations and presets among users.


Conclusion

LM Studio is today the most accessible tool for anyone who wants to run AI locally, without complex installation, without subscription, and without sacrificing privacy. In just a few clicks, you have access to open-source models capable of writing, coding, analyzing documents, and answering complex questions — all from your own machine.

To get started, the simplest path remains: download LM Studio, choose a Llama 3.2 3B or Mistral 7B model in Q4_K_M depending on your available RAM, and launch your first conversation. Once comfortable, the OpenAI-compatible local server opens the door to much more powerful integrations.


To learn more on this topic

“`

Partager cet article
Laisser un commentaire