5 Open-Source Coding LLMs You Can Run Locally in 2025

Open-source coding LLMs are democratizing AI-powered development, and local deployment is at the forefront of this revolution. By running models on your own machine, you gain privacy, eliminate API costs, and unlock deep customization. In 2025, running powerful coding AI locally is no longer a dream—it's a practical reality.

This article will introduce five of the best coding models available today. We'll compare their strengths, show you how to run them with Ollama, and provide practical use cases to get you started. This guide is for developers, hobbyists, and teams looking to harness the power of local AI without sacrificing performance.

1. Qwen3-Coder-480B-A35B-Instruct: The Agentic Powerhouse

Model Overview

Developer: Alibaba's Qwen Team
Parameters: 480B total, 35B activated (Mixture-of-Experts)
Context Window: 256K native, up to 1M with Yarn
License: Custom (research-friendly)
Specialty: Agentic coding and browser automation

Key Features

Agentic Coding Support: Designed to work with frameworks like Qwen Code and CLINE.
Long-Context Mastery: Understands repository-scale codebases with its massive context window.
Multi-Platform Compatibility: Supports most coding agent platforms.
Non-Thinking Mode: Optimized for direct code generation without reasoning blocks.

Performance Highlights

Comparable to Claude Sonnet on agentic coding tasks.
Excels in complex, multi-step software engineering workflows.
Strong performance on browser-based coding tasks.
Advanced function calling capabilities for tool use.

How to Run the Model

Running a model of this size locally is a challenge. While a direct ollama pull is not feasible for most users due to its ~200GB size, you can run quantized versions using llama.cpp with MoE offloading. This technique keeps the main model layers on the GPU and offloads the "expert" layers to system RAM.

Command to run Qwen3-Coder with MoE offloading:

./llama.cpp/llama-cli \
--model unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/UD-Q2_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL-00001-of-00004.gguf \
--ctx-size 16384 \
--n-gpu-layers 99 \
-ot ".ffn_.*_exps.=CPU" \
--threads 8 \
--temp 0.7 \
-p "<|im_start|>user\nWrite a Python function to find prime numbers up to N.<|im_end|>\n<|im_start|>assistant\n"

Best Use Cases

Complex software engineering projects.
Multi-file repository analysis and refactoring.
Automated testing and debugging workflows.
Research in agentic AI.

2. Qwen2.5-Coder: The Versatile Champion

Model Overview

Developer: Alibaba's Qwen Team
Parameters: 6 sizes available (0.5B, 1.5B, 3B, 7B, 14B, 32B)
Context Window: 32K-128K tokens
License: Apache 2.0 (fully permissive)
Specialty: Balanced performance across all coding tasks

Key Features

Six Size Options: Scales from tiny models for edge devices to a 32B powerhouse.
Multi-Language Mastery: Excels across 40+ programming languages.
Code Repair Excellence: Outstanding debugging and error-fixing capabilities.
Competitive with GPT-4o: The 32B model rivals closed-source leaders.

Performance Highlights

HumanEval: 91.0% (32B model, matching GPT-4o).
Aider (Code Repair): 73.7% (comparable to GPT-4o).
McEval: 65.9 across multiple languages.
LiveCodeBench: 43.4% (competitive with top models).

How to Run with Ollama

Qwen2.5-Coder-7B-Instruct example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(response)

Best Use Cases

General-purpose coding assistance.
Multi-language development projects.
Code debugging and repair.
Educational programming assistance.

3. StarCoder2: The Transparent Workhorse

Model Overview

Developer: BigCode Project
Parameters: 3 sizes (3B, 7B, 15B)
Context Window: 16K tokens
License: Apache 2.0
Specialty: Transparent training and code completion

Key Features

Transparent Training: Fully open training process and dataset.
Multi-Language Coverage: 600+ programming languages.
Efficient Architecture: Excellent performance-to-size ratio.
Strong Code Completion: Best-in-class fill-in-the-middle capabilities.

Performance Highlights

HumanEval FIM: 86.4% (excellent code completion).
RepoBench: Strong performance on repository-level tasks.
Size Efficiency: The 15B model matches 33B+ models on many tasks.
Training Scale: Trained on 4+ trillion tokens for robust understanding.

How to Run with Ollama

Starcoder2 text generation example:

from transformers import pipeline

pipe = pipeline("text-generation", model="bigcode/starcoder2-7b", torch_dtype="auto", device_map="auto")
prompt = "def fibonacci(n):"
result = pipe(prompt, max_new_tokens=100)
print(result[0]['generated_text'])

Best Use Cases

IDE integration and real-time code completion.
Educational environments requiring transparency.
Medium-sized development projects.
Scenarios where auditability and reproducibility matter.

4. Codestral: The Speed Demon

Model Overview

Developer: Mistral AI
Parameters: 22B
Context Window: 32K tokens
License: Apache 2.0
Specialty: Fast, efficient code generation

Key Features

Lightning-Fast Inference: Optimized for speed and efficiency.
80+ Language Fluency: Extensive programming language support.
Fill-in-the-Middle: Excellent at completing partial code.
Production-Ready: Designed for real-world development workflows.

Performance Highlights

HumanEval Python: 86.6% (excellent Python performance).
Fill-in-the-Middle: 95.3% accuracy.
Inference Speed: Among the fastest in its class.
Memory Efficiency: 13GB model size for practical deployment.

How to Run with Ollama

DeepSeek-Coder-V2 example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-coder-v2')
tokenizer = AutoTokenizer.from_pretrained('deepseek-ai/deepseek-coder-v2')

# Prompt for a data cleaning function
prompt = "Write a Python function to clean and preprocess a CSV dataset with missing values and outliers"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_length=500)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_code)

Best Use Cases

Real-time code completion in IDEs.
Multi-language development environments.
Production code generation.
Scenarios requiring fast response times.

5. Devstral: The Agentic Specialist

Model Overview

Developer: Mistral AI & All Hands AI
Parameters: 24B
Context Window: 128K tokens
License: Apache 2.0
Specialty: Software engineering agents and tool use

Key Features

Agentic Excellence: Designed specifically for coding agents.
Tool Mastery: Excels at using tools to explore and edit codebases.
Multi-File Operations: Strong at managing complex project structures.
Local Deployment: Runs on a single RTX 4090 or a 32GB Mac.

Performance Highlights

SWE-Bench Verified: 46.8% (best open-source performance).
Outperforms Larger Models: Beats Deepseek-V3 and Qwen3 232B.
Tool Usage: Exceptional at file system operations and code navigation.
Context Management: Efficient handling of large codebases.

How to Run with Ollama

Install and run Mistral Devstral model:

# Install the library
pip install mistral_inference --upgrade

# Download the model
from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/Devstral-Small-2505", 
    allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], 
    local_dir=mistral_models_path
)

# Run the model
mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300

Best Use Cases

Software engineering agents.
Automated code refactoring.
Multi-file project management.
Development workflow automation.

Let's compare these models

Performance Comparison

Model	Parameters	Context	HumanEval	SWE-Bench	Best For
Qwen3-Coder-480B	480B (35B active)	256K-1M	~90%	N/A	Agentic coding
Qwen2.5-Coder-32B	32B	128K	91.0%	N/A	All-around performance
StarCoder2-15B	15B	16K	86.4%	N/A	Code completion
Codestral-22B	22B	32K	86.6%	N/A	Speed & efficiency
Devstral-24B	24B	128K	N/A	46.8%	Software agents

Resource Requirements

Model	RAM/VRAM Needed	Disk Space	Inference Speed	Best Hardware
Qwen3-Coder-480B	64GB+	~200GB	Slow	High-end server
Qwen2.5-Coder-32B	32GB+	~20GB	Medium	RTX 4090 / M2 Max
StarCoder2-15B	16GB+	~9GB	Fast	RTX 3090 / M1 Pro
Codestral-22B	16GB+	~13GB	Very Fast	RTX 4080 / M2
Devstral-24B	24GB+	~14GB	Medium	RTX 4090 / 32GB Mac

Conclusion

The era of local coding AI has arrived. Whether you need a powerful agent for complex projects or a speedy assistant for real-time completion, there is an open-source model you can run on your own hardware. By starting with a model that fits your needs and resources, you can unlock a new level of productivity, privacy, and customization in your development workflow. The best way to begin is to download Ollama, pull a model, and start experimenting today.

FAQs

What is Qwen3-Coder?
Qwen3-Coder is a high-performance, open-source AI coding model supporting agentic workflows, 256k+ context, and over 100 programming languages, ideal for local and enterprise deployment.

How does Codestral benefit local coding projects?
Codestral delivers advanced code generation and reasoning features for multiple languages, with permissive licensing and hardware-friendly inference—perfect for privacy-focused local development.

What makes StarCode2 stand out?
StarCode2 excels at code completion, bug fixing, and documentation, designed for low-latency operation and compatibility with popular IDEs and toolchains.

Is Devastral suited for agentic coding tasks?
Yes, Devastral’s architecture is specialized for multi-step code planning, automated review, and intelligent code translation for Python, C++, JavaScript, and more.

Can these coding LLMs run on consumer hardware in 2025?
All five models—Qwen3-Coder, Devastral, StarCode2, Codestral, and Qwen-2.5Coder—are optimized for local inference, with efficient quantized variants and active open-source support for running on desktops, laptops, and servers.