09.02.2026 12:53 ● Author: Viacheslav Vasipenok

LM Studio 0.4.0: Revolutionizing Local AI Workflows with Smarter Tools and Enhanced Flexibility

In the ever-expanding world of local AI development, LM Studio has just rolled out its version 0.4.0, bringing a host of updates that streamline workflows, boost performance, and cater to both casual users and power developers.

Announced via their official blog, this release focuses on usability, deployment options, and advanced features that make running large language models (LLMs) on your own hardware more intuitive and powerful than ever. Whether you're tinkering with models for fun or building production-grade applications, these changes address pain points and open new possibilities. Let's dive into what's new.

A Fresh, Intuitive Interface Overhaul

One of the standout changes in 0.4.0 is the completely revamped user interface, designed for consistency and ease of use. The redesign simplifies navigation, with updated styles for chat messages, hardware settings, and sidebars.

A key highlight is the new Split View mode, which lets you divide your screen to run and compare multiple chat sessions side by side — perfect for benchmarking responses from different models.

Simply click the Split View icon in the top right, drag chat tabs into place, and you're set. This feature supports up to two panes, making A/B testing of LLMs a breeze.

Work with Model Control Protocols (MCPs) has also been simplified: MCPs now load only when needed, rather than at startup, reducing overhead.

Additionally, the introduction of permission keys allows finer control over client access to the LM Studio server, enhancing security for shared environments.

For advanced users, the new Developer Mode consolidates previous settings into a single toggle under Settings > Developer.

This mode unlocks hidden options across the app, including in-app documentation for REST API, CLI commands, and live processing status for loaded LLMs. It's a unified hub for power users, replacing the old multi-mode system.

Other UI tweaks include a resizable model search modal (accessible via Cmd/Ctrl + Shift + M), persistent filter preferences, and new settings like enforcing one new empty chat at a time or positioning the primary navigation on top or left. Bug fixes address visual glitches, such as chat duplication and export failures, ensuring a smoother experience.

Parallel Inference: Handling Multiple Requests Like a Pro

A game-changer for high-throughput scenarios is the addition of parallel inference, powered by continuous batching in the llama.cpp engine (version 2.0.0). This allows a single model to process multiple requests simultaneously without queuing, significantly reducing latency and boosting efficiency.

Configure it via the Max Concurrent Predictions slider in the model loader dialog — defaulting to 4 slots with Unified KV Cache enabled to manage varying request sizes without extra memory use. Note that MLX engine support on Macs is still in development, so Apple users might need to wait for full compatibility.

This feature is ideal for developers building apps that handle concurrent queries, like chatbots or APIs, turning LM Studio into a more robust local server alternative.

Easier Deployment with CLI Enhancements

Deployment gets a major upgrade with llmster, a headless daemon for running LM Studio without a GUI — perfect for servers, cloud instances, or terminal-only setups. Installation is straightforward: Use curl scripts for Linux/Mac or PowerShell for Windows. Core commands include `lms daemon up` to start the daemon, `lms get <model>` for downloads, and `lms server start` to spin up a local server.

The new lms chat CLI interface offers an interactive terminal chat experience, complete with slash commands like `/model` to switch models, `/download` for fetching new ones, and `/system-prompt` for custom instructions. It supports large content pasting, thinking highlights, and improved help/logging, making CLI workflows more accessible.

Versioning now uses commit hashes for precision, and commands like `lms runtime update llama.cpp` keep your setup current.

Chat Export and API Expansions

Sharing your work is simpler with new chat export options: Save conversations as PDF (including images), Markdown, or plain text via the chat menu. This is great for documentation, reports, or archiving experiments.

On the API front, a stateful REST API endpoint at `/v1/chat` maintains conversation state using `response_id`, enabling multi-step workflows with detailed stats like token counts and speeds. It supports local MCPs when permission keys are enabled. A new `/api/v1/models/unload` endpoint lets you unload models programmatically, and error formatting has been refined. Breaking changes include renaming `model_instance_id` to `instance_id` in load responses.

Additional Gems and Under-the-Hood Improvements

Beyond the headliners, 0.4.0 adds support for models like FunctionGemma, MistralAI Ministral (3B, 8B, 13B), and EssentialAI's rnj-1. There's also LFM2 tool call format compatibility, an n_cpu_moe slider for CPU offloading in Mixture of Experts models, and prompt processing progress indicators.

Image handling in chats now includes download, copy, and reveal buttons.

Bug fixes abound: From model indexing issues to settings persistence after updates, and even API image validation without loading models. Hardware info via `lms runtime survey` and GPU support enhancements round out the release.

Wrapping Up: A Step Toward Seamless Local AI

LM Studio 0.4.0 isn't just an update — it's a maturation of the platform, making it more versatile for developers, researchers, and hobbyists alike. With its focus on parallel processing, headless deployment, and user-friendly features, it lowers barriers to entry while empowering advanced use cases.

If you're into local LLMs, head to lmstudio.ai to download and explore. This release solidifies LM Studio's role as a go-to tool in the open-source AI ecosystem, promising even more innovations ahead.

0 comments

LM Studio 0.4.0: Revolutionizing Local AI Workflows with Smarter Tools and Enhanced Flexibility

A Fresh, Intuitive Interface Overhaul

Parallel Inference: Handling Multiple Requests Like a Pro

Easier Deployment with CLI Enhancements

Chat Export and API Expansions

Additional Gems and Under-the-Hood Improvements

Wrapping Up: A Step Toward Seamless Local AI

Popular

The Anatomy of an Entrepreneur

What is a Startup?

Advertising on QUASA

8 Logo Design Tips for Small Businesses

Top 5 Tips to Make More Money as a Content Creator

Latest news