LM Studio: Complete Guide to Installing and Using Local AI (2026)

What if you could run an artificial intelligence as powerful as ChatGPT directly on your computer, without a subscription, without internet connection, and without sending any data to a third-party server? That’s exactly the promise of LM Studio, the desktop application that has become the reference in 2026 for running LLMs locally — and without any command line. Whether you’re a developer, researcher, writer, or simply curious, this LM Studio tutorial guides you step by step: installation on Windows, macOS, and Linux, choosing the right model according to your hardware configuration, parameter tuning, activating the local OpenAI-compatible server, and advanced tips to get the best out of your private AI.

Don’t forget to explore our directory of AI tools and LLMs!

What is LM Studio?

LM Studio is a free desktop application that lets you download, manage, and use open source language models (LLMs) directly on your machine. Unlike cloud solutions like ChatGPT, Claude, or Gemini, everything happens locally: models are stored on your hard drive, loaded into your RAM or VRAM, and exchanges never leave your computer.

Concretely, LM Studio acts as a universal manager for models in GGUF format (the most widely used quantization format for local LLMs) and MLX (optimized for Apple Silicon chips). It connects directly to the Hugging Face catalog to allow you to search and download models in just a few clicks, without ever touching a terminal.

The concrete advantages of LM Studio

Complete privacy. Your data never leaves your machine. No prompt is sent to an external server, no history is stored in the cloud. It’s the ideal solution for professionals handling sensitive data (HR, legal, medical, proprietary code).

Zero subscription. Once models are downloaded, LM Studio works entirely offline. No message limit, no quota, no cutoff at 20 messages per hour.

Maximum flexibility. Dozens of open source models are available: Llama (Meta), Qwen (Alibaba), Mistral, DeepSeek, Gemma (Google), Phi (Microsoft), and many others. You freely choose the model best suited to your task and hardware.

Local API compatible with OpenAI. LM Studio exposes a local server at http://localhost:1234/v1, compatible with the OpenAI API — which allows you to integrate your local AI into any existing application.

Required hardware configuration

Before installing LM Studio, you need to make sure your machine is compatible. Good news: the requirements are far lower than you might think.

Minimum configuration

Component	Minimum	Recommended
RAM	8 GB	16 GB or more
Storage	10 GB free	50 GB free (multiple models)
CPU	x64 or ARM64 with AVX2	Recent (4+ cores)
GPU	Not mandatory	6 GB VRAM minimum for speed gains
OS	Windows 10/11, macOS 12+, Ubuntu 20.04+	—

Which model for which configuration?

The choice of model depends directly on your available RAM and VRAM. Here’s a practical guide:

Configuration	Recommended models	Performance
CPU only, 8 GB RAM	Qwen3 4B Q4, Phi-3 mini (3.8B)	Slow but functional
CPU only, 16 GB RAM	Llama 3.2 8B Q4_K_M, Mistral 7B Q4	Acceptable for regular use
GPU 6-8 GB VRAM	Llama 3.1 8B Q4_K_M, Qwen3 8B Q4	Fast, ~ChatGPT 3.5
GPU 12-16 GB VRAM	Qwen3 14B, Gemma 3 12B, DeepSeek-R1 14B	Very performant
GPU 24 GB VRAM	Qwen3 30B, Llama 3.3 70B Q4	GPT-4 level

Tip: LM Studio automatically displays the amount of RAM/VRAM needed for each variant of a model before you download it. No need to guess.

Step 1: Download and install LM Studio

On Windows

Go to lmstudio.ai/download.
Download the .exe file corresponding to Windows.
Launch the installer and follow the steps (standard installation “Next / Next / Finish”).
Open LM Studio from the Start menu or the desktop shortcut created automatically.

On macOS

Download the .dmg file from lmstudio.ai/download.
Open the .dmg and drag the LM Studio icon to the Applications folder.
Launch LM Studio from Launchpad or Spotlight (⌘ + Space, type “LM Studio”).

On Apple Silicon (M1, M2, M3, M4), LM Studio automatically uses the MLX engine for native GPU acceleration. Performance is excellent, even on entry-level MacBook Airs.

On Linux (Ubuntu / Debian)

Download the .AppImage file from the official website.
Make it executable: chmod +x LMStudio-*.AppImage
Launch it: ./LMStudio-*.AppImage

LM Studio supports Ubuntu 20.04+ and compatible distributions. It automatically detects CUDA (NVIDIA) and ROCm (AMD) for GPU acceleration.

Step 2: Understanding the LM Studio interface

On first launch, LM Studio presents an interface organized around several main tabs:

Discover: the model browser connected to Hugging Face. This is where you search and download models.
Chat: the conversation interface, similar to ChatGPT.
Developer (or Local Server): the local OpenAI-compatible server for developers.
My Models: the list of your already downloaded models.

The right sidebar panel in the Chat view provides access to generation parameters: temperature, context length, top-p, repeat penalty, prompt system — everything is adjustable without needing to restart the model.

Step 3: Choose and download a model

This is the step that often confuses beginners: the multitude of models and variants on Hugging Face can seem intimidating. Here’s how to make sense of it.

Open the model browser

Press Ctrl + Shift + M (Windows/Linux) or ⌘ + Shift + M (Mac) to open the model search. LM Studio displays a selection of models recommended by its team (“Staff Picks”) as well as recent releases.

Understanding quantization (GGUF)

The models available in LM Studio are in GGUF format, a compression format that allows you to reduce a model’s size so it fits in memory on consumer hardware. The most common quantization levels:

Format	Quality	Typical size (7B)	Usage
Q8_0	Excellent (nearly original)	~7 GB	GPU 8+ GB VRAM
Q6_K	Very good	~5.5 GB	GPU 6-8 GB VRAM
Q4_K_M	Good (recommended)	~4.5 GB	GPU 4-6 GB VRAM
Q3_K_M	Decent	~3.5 GB	CPU or GPU < 4 GB
Q2_K	Degraded	~2.5 GB	Very limited machines

The golden rule: choose Q4_K_M as your starting point. It’s the best quality/size compromise for virtually all uses. The difference in quality between Q4_K_M and Q8_0 is almost imperceptible for writing or coding.

Recommended models to get started

Beginner / general use: Llama 3.2 3B Q4_K_M (very fast, ~2 GB)
Daily chat: Mistral 7B Instruct Q4_K_M (~4.5 GB)
Reasoning / analysis: DeepSeek-R1 8B Q4_K_M (~5 GB)
Coding: Qwen3 Coder 8B Q4_K_M (~5 GB)
Maximum performance (16+ GB VRAM): Qwen3 30B Q4_K_M

To start the download, simply click the Download button next to the desired variant. LM Studio displays the progress and required disk space. Models are stored in ~/.lmstudio/models/ (macOS/Linux) or C:\Users\[your username]\.lmstudio\models\ (Windows).

Step 4: Load a model and start a conversation

Once the download is complete:

Go to the Chat tab.
Click on the dropdown menu at the top of the screen (it displays “Select a model”).
Choose your model from the My Models list.
Click Load — a progress bar appears while loading into memory (5 to 15 seconds for a 7B Q4_K_M model).
Start typing in the input area.

Setting the system prompt

The system prompt defines the AI’s general behavior: its role, tone, constraints. You’ll find it in the right panel under “System Prompt”. For example:

You are an expert assistant in SEO writing. You write clear, well-structured texts optimized for search engines.

Key parameters to know

Temperature: controls the creativity of responses. Recommended value: 0.7 for writing, 0.2 for code or precise tasks.
Context Length: the model’s “memory” expressed in tokens. A higher value consumes more VRAM. 4096 tokens is a good starting point; increase as needed.
GPU Layers: the number of model layers loaded on the GPU. Set this slider to maximum — if the model doesn’t fit entirely in VRAM, LM Studio will automatically move excess layers to CPU.

Step 5: Activate the local server (for developers)

This is the most powerful feature of LM Studio for technical profiles. The local server exposes an OpenAI-compatible API at http://localhost:1234/v1, which means any tool or script designed for GPT-4 can be redirected to your local model without modifying code — just by changing the base URL.

Starting the server

Go to the Developer tab (or Local Server depending on your version).
Select the model to use.
Click Start Server.
The server is active on port 1234.

Python example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"  # Fictitious value, LM Studio doesn't require one
)

response = client.chat.completions.create(
    model="lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF",
    messages=[
        {"role": "system", "content": "You are an expert SEO assistant."},
        {"role": "user", "content": "Give me 5 article ideas on generative AI."}
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)

Important: the model name in the API call must exactly match the identifier displayed in the Developer tab of LM Studio. Copy it directly from the interface to avoid errors.

Integration with other tools

LM Studio’s local server is compatible with many tools that rely on the OpenAI API:

Continue (VS Code/JetBrains extension for assisted coding)
Open WebUI (advanced chat interface)
n8n / Make (workflow automation)
Cursor (AI code editor)
Any Python or Node.js script using the OpenAI SDK

Analyzing documents with LM Studio

LM Studio supports loading PDF, TXT, and Word files to analyze them directly in the conversation. For short documents, the model reads the entire content. For long documents, LM Studio automatically activates a RAG system (Retrieval-Augmented Generation): it extracts only passages relevant to your question, which prevents overwhelming the context window.

To load a document, use the attachment icon in the chat input bar, or drag and drop the file directly into the interface.

LM Studio vs Ollama: Which should you choose?

LM Studio and Ollama are the two most popular tools for local LLMs. They don’t quite address the same profile.

Criterion	LM Studio	Ollama
Graphical interface	✅ Complete interface	❌ Terminal only
Ease of installation	✅ Standard installer	✅ Single command
Model browser	✅ Built-in (Hugging Face)	❌ `ollama pull` command
Local API server	✅ Port 1234	✅ Port 11434
OpenAI compatibility	✅ Yes	✅ Yes
Terminal-free usage	✅ Ideal	❌ Difficult
Automation / scripting	⚠️ Possible via CLI `lms`	✅ Native
Lightness / services	⚠️ Heavy application	✅ Lightweight background service

In summary: LM Studio is the natural choice for beginners and non-developer profiles who want a smooth and visual experience. Ollama is preferred by developers who want to script, automate, and integrate LLMs into their pipelines. Many advanced users use both: LM Studio for exploring and testing models, Ollama for production integrations.

Troubleshooting common issues

The model is generating very slowly

This is the most common problem for new users. The cause is almost always the same: the model doesn’t fit entirely in VRAM and layers are running on the CPU, which is much slower for this type of computation. Solutions: reduce the model size (go from 7B to 3B), choose a lighter quantization (Q4_K_M instead of Q8_0), or decrease the Context Length in the parameters.

The model crashes on loading

Check that the downloaded GGUF file isn’t corrupted (LM Studio sometimes displays a checksum error). Delete the model from the interface and re-download it. If the problem persists, check that your GPU driver is up to date (CUDA for NVIDIA, ROCm for AMD).

AVX2 error at startup

Your processor doesn’t support the AVX2 instructions required by LM Studio. This is mainly the case on very old machines (before 2013). LM Studio cannot run on these configurations.

Local API is not responding

Make sure a model is properly loaded and active before starting the server. The server cannot run without a model in memory. Also check that port 1234 is not being used by another application.

Going further with LM Studio

LM Studio has offered several advanced features since 2025-2026 that make it a true AI work environment:

LM Studio CLI (lms). A command-line interface for users who want to script: lms get <model> to download a model, lms infer to run inference directly from the terminal, lms ls to list installed models.

MCP support (Model Context Protocol). LM Studio can now function as an MCP client, which allows it to use external tools (web access, file system, databases) during conversation — similar to “tools” in the OpenAI API.

Llmster. A no-GUI mode that allows you to deploy LM Studio on Linux servers or in CI/CD environments, without needing a screen.

LM Studio Hub. A space for sharing configurations and presets between users.

Conclusion

LM Studio is today the most accessible tool for anyone who wants to run AI locally, without complex installation, without subscription, and without sacrificing privacy. In just a few clicks, you have access to open source models capable of writing, coding, analyzing documents, and answering complex questions — all from your own machine.

To get started, the simplest path remains: download LM Studio, choose a Llama 3.2 3B or Mistral 7B in Q4_K_M model depending on your available RAM, and start your first conversation. Once comfortable, the local OpenAI-compatible server opens the door to much more powerful integrations.

To go further on this topic

“`

LM Studio: Complete Guide to Installing and Using Local AI (2026)

What is LM Studio?

The concrete advantages of LM Studio

Required hardware configuration

Minimum configuration

Which model for which configuration?

Step 1: Download and install LM Studio

On Windows

On macOS

On Linux (Ubuntu / Debian)

Step 2: Understanding the LM Studio interface

Step 3: Choose and download a model

Open the model browser

Understanding quantization (GGUF)

Recommended models to get started

Step 4: Load a model and start a conversation

Setting the system prompt

Key parameters to know

Step 5: Activate the local server (for developers)

Starting the server

Python example

Integration with other tools

Analyzing documents with LM Studio

LM Studio vs Ollama: Which should you choose?

Troubleshooting common issues

The model is generating very slowly

The model crashes on loading

AVX2 error at startup

Local API is not responding

Going further with LM Studio

Conclusion

To go further on this topic

Like this:

Leave a Reply Cancel reply

Inscription à la newsletter - AI & tech

A propos

Réseaux

What is LM Studio?

The concrete advantages of LM Studio

Required hardware configuration

Minimum configuration

Which model for which configuration?

Step 1: Download and install LM Studio

On Windows

On macOS

On Linux (Ubuntu / Debian)

Step 2: Understanding the LM Studio interface

Step 3: Choose and download a model

Open the model browser

Understanding quantization (GGUF)

Recommended models to get started

Step 4: Load a model and start a conversation

Setting the system prompt

Key parameters to know

Step 5: Activate the local server (for developers)

Starting the server

Python example

Integration with other tools

Analyzing documents with LM Studio

LM Studio vs Ollama: Which should you choose?

Troubleshooting common issues

The model is generating very slowly

The model crashes on loading

AVX2 error at startup

Local API is not responding

Going further with LM Studio

Conclusion

To go further on this topic

Like this:

Souscrivez à notre newsletter !

Actualités intelligence artificielle et tech général

Leave a Reply Cancel reply

You Might Also Like

Record Data Breach: 16 Billion Identifiers Exposed

Cluely Raises $15 Million: AI for Universal Cheating Detection

AI Tools for Marketing: The Best Solutions in 2026

Réseaux

Notre annuaire IA est en ligne !