5 Open-Source Coding LLMs You Can Run Locally in 2025

Open-source coding LLMs are democratizing AI-powered development, and local deployment is at the forefront of this revolution. By running models on your own machine, you gain privacy, eliminate API costs, and unlock deep customization. In 2025, running powerful coding AI locally is no longer a dream—it's a practical reality.

This article will introduce five of the best coding models available today. We'll compare their strengths, show you how to run them with Ollama, and provide practical use cases to get you started. This guide is for developers, hobbyists, and teams looking to harness the power of local AI without sacrificing performance.

1. Qwen3-Coder-480B-A35B-Instruct: The Agentic Powerhouse

Model Overview

  • Developer: Alibaba's Qwen Team
  • Parameters: 480B total, 35B activated (Mixture-of-Experts)
  • Context Window: 256K native, up to 1M with Yarn
  • License: Custom (research-friendly)
  • Specialty: Agentic coding and browser automation

Key Features

  • Agentic Coding Support: Designed to work with frameworks like Qwen Code and CLINE.
  • Long-Context Mastery: Understands repository-scale codebases with its massive context window.
  • Multi-Platform Compatibility: Supports most coding agent platforms.
  • Non-Thinking Mode: Optimized for direct code generation without reasoning blocks.

Performance Highlights

  • Comparable to Claude Sonnet on agentic coding tasks.
  • Excels in complex, multi-step software engineering workflows.
  • Strong performance on browser-based coding tasks.
  • Advanced function calling capabilities for tool use.

How to Run the Model

Running a model of this size locally is a challenge. While a direct ollama pull is not feasible for most users due to its ~200GB size, you can run quantized versions using llama.cpp with MoE offloading. This technique keeps the main model layers on the GPU and offloads the "expert" layers to system RAM.

Command to run Qwen3-Coder with MoE offloading:

./llama.cpp/llama-cli \
--model unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/UD-Q2_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL-00001-of-00004.gguf \
--ctx-size 16384 \
--n-gpu-layers 99 \
-ot ".ffn_.*_exps.=CPU" \
--threads 8 \
--temp 0.7 \
-p "<|im_start|>user\nWrite a Python function to find prime numbers up to N.<|im_end|>\n<|im_start|>assistant\n"

Best Use Cases

  • Complex software engineering projects.
  • Multi-file repository analysis and refactoring.
  • Automated testing and debugging workflows.
  • Research in agentic AI.

2. Qwen2.5-Coder: The Versatile Champion

Model Overview

  • Developer: Alibaba's Qwen Team
  • Parameters: 6 sizes available (0.5B, 1.5B, 3B, 7B, 14B, 32B)
  • Context Window: 32K-128K tokens
  • License: Apache 2.0 (fully permissive)
  • Specialty: Balanced performance across all coding tasks

Key Features

  • Six Size Options: Scales from tiny models for edge devices to a 32B powerhouse.
  • Multi-Language Mastery: Excels across 40+ programming languages.
  • Code Repair Excellence: Outstanding debugging and error-fixing capabilities.
  • Competitive with GPT-4o: The 32B model rivals closed-source leaders.

Performance Highlights

  • HumanEval: 91.0% (32B model, matching GPT-4o).
  • Aider (Code Repair): 73.7% (comparable to GPT-4o).
  • McEval: 65.9 across multiple languages.
  • LiveCodeBench: 43.4% (competitive with top models).

How to Run with Ollama

Qwen2.5-Coder-7B-Instruct example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Qwen/Qwen2.5-Coder-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(response)

Best Use Cases

  • General-purpose coding assistance.
  • Multi-language development projects.
  • Code debugging and repair.
  • Educational programming assistance.

3. StarCoder2: The Transparent Workhorse

Model Overview

  • Developer: BigCode Project
  • Parameters: 3 sizes (3B, 7B, 15B)
  • Context Window: 16K tokens
  • License: Apache 2.0
  • Specialty: Transparent training and code completion

Key Features

  • Transparent Training: Fully open training process and dataset.
  • Multi-Language Coverage: 600+ programming languages.
  • Efficient Architecture: Excellent performance-to-size ratio.
  • Strong Code Completion: Best-in-class fill-in-the-middle capabilities.

Performance Highlights

  • HumanEval FIM: 86.4% (excellent code completion).
  • RepoBench: Strong performance on repository-level tasks.
  • Size Efficiency: The 15B model matches 33B+ models on many tasks.
  • Training Scale: Trained on 4+ trillion tokens for robust understanding.

How to Run with Ollama

Starcoder2 text generation example:

from transformers import pipeline

pipe = pipeline("text-generation", model="bigcode/starcoder2-7b", torch_dtype="auto", device_map="auto")
prompt = "def fibonacci(n):"
result = pipe(prompt, max_new_tokens=100)
print(result[0]['generated_text'])

Best Use Cases

  • IDE integration and real-time code completion.
  • Educational environments requiring transparency.
  • Medium-sized development projects.
  • Scenarios where auditability and reproducibility matter.

4. Codestral: The Speed Demon

Model Overview

  • Developer: Mistral AI
  • Parameters: 22B
  • Context Window: 32K tokens
  • License: Apache 2.0
  • Specialty: Fast, efficient code generation

Key Features

  • Lightning-Fast Inference: Optimized for speed and efficiency.
  • 80+ Language Fluency: Extensive programming language support.
  • Fill-in-the-Middle: Excellent at completing partial code.
  • Production-Ready: Designed for real-world development workflows.

Performance Highlights

  • HumanEval Python: 86.6% (excellent Python performance).
  • Fill-in-the-Middle: 95.3% accuracy.
  • Inference Speed: Among the fastest in its class.
  • Memory Efficiency: 13GB model size for practical deployment.

How to Run with Ollama

DeepSeek-Coder-V2 example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-coder-v2')
tokenizer = AutoTokenizer.from_pretrained('deepseek-ai/deepseek-coder-v2')

# Prompt for a data cleaning function
prompt = "Write a Python function to clean and preprocess a CSV dataset with missing values and outliers"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_length=500)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_code)

Best Use Cases

  • Real-time code completion in IDEs.
  • Multi-language development environments.
  • Production code generation.
  • Scenarios requiring fast response times.

5. Devstral: The Agentic Specialist

Model Overview

  • Developer: Mistral AI & All Hands AI
  • Parameters: 24B
  • Context Window: 128K tokens
  • License: Apache 2.0
  • Specialty: Software engineering agents and tool use

Key Features

  • Agentic Excellence: Designed specifically for coding agents.
  • Tool Mastery: Excels at using tools to explore and edit codebases.
  • Multi-File Operations: Strong at managing complex project structures.
  • Local Deployment: Runs on a single RTX 4090 or a 32GB Mac.

Performance Highlights

  • SWE-Bench Verified: 46.8% (best open-source performance).
  • Outperforms Larger Models: Beats Deepseek-V3 and Qwen3 232B.
  • Tool Usage: Exceptional at file system operations and code navigation.
  • Context Management: Efficient handling of large codebases.

How to Run with Ollama

Install and run Mistral Devstral model:

# Install the library
pip install mistral_inference --upgrade

# Download the model
from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/Devstral-Small-2505", 
    allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], 
    local_dir=mistral_models_path
)

# Run the model
mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300

Best Use Cases

  • Software engineering agents.
  • Automated code refactoring.
  • Multi-file project management.
  • Development workflow automation.

Let's compare these models

Performance Comparison

ModelParametersContextHumanEvalSWE-BenchBest For
Qwen3-Coder-480B480B (35B active)256K-1M~90%N/AAgentic coding
Qwen2.5-Coder-32B32B128K91.0%N/AAll-around performance
StarCoder2-15B15B16K86.4%N/ACode completion
Codestral-22B22B32K86.6%N/ASpeed & efficiency
Devstral-24B24B128KN/A46.8%Software agents

Resource Requirements

ModelRAM/VRAM NeededDisk SpaceInference SpeedBest Hardware
Qwen3-Coder-480B64GB+~200GBSlowHigh-end server
Qwen2.5-Coder-32B32GB+~20GBMediumRTX 4090 / M2 Max
StarCoder2-15B16GB+~9GBFastRTX 3090 / M1 Pro
Codestral-22B16GB+~13GBVery FastRTX 4080 / M2
Devstral-24B24GB+~14GBMediumRTX 4090 / 32GB Mac

Conclusion

The era of local coding AI has arrived. Whether you need a powerful agent for complex projects or a speedy assistant for real-time completion, there is an open-source model you can run on your own hardware. By starting with a model that fits your needs and resources, you can unlock a new level of productivity, privacy, and customization in your development workflow. The best way to begin is to download Ollama, pull a model, and start experimenting today.

FAQs

What is Qwen3-Coder?
Qwen3-Coder is a high-performance, open-source AI coding model supporting agentic workflows, 256k+ context, and over 100 programming languages, ideal for local and enterprise deployment.

How does Codestral benefit local coding projects?
Codestral delivers advanced code generation and reasoning features for multiple languages, with permissive licensing and hardware-friendly inference—perfect for privacy-focused local development.

What makes StarCode2 stand out?
StarCode2 excels at code completion, bug fixing, and documentation, designed for low-latency operation and compatibility with popular IDEs and toolchains.

Is Devastral suited for agentic coding tasks?
Yes, Devastral’s architecture is specialized for multi-step code planning, automated review, and intelligent code translation for Python, C++, JavaScript, and more.

Can these coding LLMs run on consumer hardware in 2025?
All five models—Qwen3-Coder, Devastral, StarCode2, Codestral, and Qwen-2.5Coder—are optimized for local inference, with efficient quantized variants and active open-source support for running on desktops, laptops, and servers.