LogosBy George Sarris (0xcircuitbreaker)·April 20, 2026·11 min read

45 Open-Weight Models Ranked for Fine-Tuning in 2026

Every base model in Logos — 45 across 16 families — ranked by production fit. Llama, Qwen, Mistral, Gemma, Phi, DeepSeek, Nemotron, GLM, Kimi, Hermes, MiniMax.

How to read this list

Choosing a base model is about matching four constraints: the task your fine-tune has to solve, the inference hardware you'll run on in production, the training budget you have, and the licensing your deployment requires. This post ranks all 45 models across 16 families in the Logos catalog with those constraints visible, grouped by size class.

Models are listed with the minimum Logos tier that trains them, the per-model surcharge (transparently shown on the order page — no opaque upsells), the license, and the shortest honest description of what each model is for.

Sub-10B — fast iteration, prototyping, edge deployment

The Starter tier's home. These models fit on a consumer GPU, LoRA trains in hours, and most cost $349 flat with no surcharge.

Qwen 3.5 2B (Starter, no surcharge, Apache 2.0)

The smallest current-generation Qwen. Strong when iteration speed matters more than ceiling quality. Good first stop for prototyping.

Qwen 2.5 3B Instruct (Starter, no surcharge, Qwen Research license)

Compact 3B. Fastest Qwen 2.5 iteration loop, edge-deployment friendly, cheapest starter option in the Qwen family. Note: Qwen Research license — check commercial-use terms.

Qwen 3.5 4B (Starter, no surcharge, Apache 2.0)

Qwen 3.5 Small (March 2026). 4B with 256K context, 201 languages, thinking/non-thinking modes. Edge-ready efficient reasoning.

Gemma 4 4B E4B (Starter, no surcharge, Apache 2.0)

Google's smallest Gemma 4. Multimodal baseline (vision + audio). Efficient enough for consumer GPUs and edge deployment. If you need multimodal at a small size, this is the only realistic pick in the sub-8B class.

Mistral 7B Instruct v0.3 (Starter, no surcharge, Apache 2.0)

The most-fine-tuned open 7B on the planet — recipes and hyperparameters are public-domain common knowledge. Proven safe choice, especially for style-transfer and classification work.

Qwen 2.5 7B Instruct (Starter, no surcharge, Apache 2.0)

Battle-tested prior generation — heavily documented, lots of community recipes. A safer baseline than newer releases if you want predictability over novelty.

Llama 3.1 8B Instruct (Starter, no surcharge, Llama 3.1 license)

The workhorse 8B. General-purpose instruction following, fastest iteration cycle in the catalog. Default answer for "I want a fine-tune I can ship this week." (Llama license — 700M MAU restriction.)

DeepSeek-R1-Distill-Llama-8B (Starter, no surcharge, MIT)

Reasoning-distilled from R1 onto a Llama-8B backbone. Strong chain-of-thought out of the box — fine-tune on top when you need reasoning quality at 8B footprint.

Llama-3.1-Nemotron-Nano-8B (Starter, no surcharge, Llama 3.1 license)

NVIDIA's reasoning-tuned 8B Llama. Competitive with much larger models on agentic tasks. Pick this over vanilla Llama 3.1 8B when tool-calling and multi-step reasoning are the dominant workload.

Qwen3 8B (Starter, no surcharge, Apache 2.0)

Qwen 3 dense with thinking/non-thinking dual-mode reasoning. Apache 2.0 licensed alternative to Llama 3.1 8B when the Llama 700M MAU restriction is a concern.

Qwen 3.5 9B (Starter, no surcharge, Apache 2.0)

Qwen 3.5 Small (March 2026). The 9B sweet spot for sub-12B production agents. 256K context, strong agentic coding and multimodal. Newest model in the 8–9B class.

Hermes 3 Llama 3.1 8B (Starter, no surcharge, Llama 3.1 license)

Nous Research's fine-tune of Llama 3.1 8B — function-calling and agentic behavior pre-trained into the model, minimally filtered, highly steerable. Strong starting point for assistant-style products where you'd otherwise spend your fine-tune just teaching function-calling.

12B to 24B — the production sweet spot

Mistral Nemo 12B Instruct (Studio, no surcharge, Apache 2.0)

Balanced reasoning + long-context baseline. Good middle-of-the-road choice when you want more ceiling than 8B but don't need 32B-class compute.

Gemma 3 12B Instruct (Starter, no surcharge, Gemma license)

Strong safety baseline, efficient on consumer GPUs. Available on Starter — useful when the Google-trained safety profile matches your product.

Phi-4 14B (Studio, no surcharge, MIT)

Top-tier reasoning in a small footprint — punches well above its size class on math, logic, and code benchmarks. Pick Phi-4 when reasoning quality per inference dollar is the primary concern.

Qwen 2.5 14B Instruct (Starter, no surcharge, Apache 2.0)

Middle-class Qwen 2.5 — bridges the gap between 7B and 32B. Heavily documented recipes, strong general-purpose baseline. A Starter-tier 14B is rare; this is the most-cost-effective mid-size option.

Qwen3 14B (Starter, no surcharge, Apache 2.0)

Qwen 3 dense 14B — direct competitor to Phi-4 14B with Apache 2.0 licensing and dual-mode reasoning. Newer generation than Qwen 2.5 14B.

Mistral Small 24B Instruct (Studio, no surcharge, Apache 2.0)

Modern dense model sitting between Phi-4 14B and Qwen 32B. Good long-horizon choice for customer-facing agents — the Mistral family's tool-use behavior is predictable and the 24B size is cheap enough to run at scale.

27B to 36B — mid-frontier dense and MoE

Gemma 3 27B Instruct (Studio, no surcharge, Gemma license)

Frontier Gemma of the prior generation. Fills the 27B mid-class gap with Google's training.

Qwen 3.5 27B (Studio, no surcharge, Apache 2.0)

Current Qwen generation in the dense 27B class (released Feb 2026). Pick over Qwen 2.5 32B when you specifically want 3.5's updated training data and safety profile.

Qwen 2.5 32B Instruct (Studio, no surcharge, Apache 2.0)

Mature 30B-class baseline, heavily documented, lots of community recipes. The safer choice if you want 30B-class capability without betting on a new release.

Qwen3 32B (Studio, no surcharge, Apache 2.0)

Qwen 3 dense flagship at 32B — thinking-mode reasoning, heavily documented community recipes. Apache 2.0. Pick over Qwen 2.5 32B for the newer generation; pick 2.5 32B for more-documented recipes.

DeepSeek-R1-Distill-Qwen-32B (Studio, +$200 surcharge, MIT)

Reasoning-distilled 32B — deepest chain-of-thought of any open model at this size. The surcharge reflects the care needed to preserve CoT traces during fine-tuning.

Qwen3 30B-A3B MoE (Studio, +$300 surcharge, Apache 2.0)

Qwen 3 sparse MoE — 30B total, 3B active per token. Production-efficient inference on consumer GPUs with strong reasoning mode. Surcharge reflects MoE memory overhead during training.

Qwen 3.5 35B-A3B MoE (Studio, +$300 surcharge, Apache 2.0)

Sparse MoE — 35B total, 3B active. Strong reasoning with thinking mode. Surcharge for MoE memory overhead; the full parameter count stays resident even under sparse activation during training.

Qwen 3.6 27B (Studio, included, Apache 2.0)

Latest Qwen dense 27B release, announced in April 2026. Best when you want the newer Qwen 3.6 agentic coding behavior without taking on MoE training overhead or a model surcharge.

Qwen 3.6 35B-A3B MoE (Studio, +$300 surcharge, Apache 2.0)

Latest Qwen MoE, April 2026 release with agentic coding upgrades. Pick this over 3.5 35B-A3B for newer agentic behavior; pick 3.5 if you want the more-documented generation.

Hermes 4.3 36B ByteDance Seed base (Studio, +$300 surcharge, ByteDance Seed + Hermes terms)

Nous Research's August 2025 release — first Hermes on a non-Meta base. 70B-class performance in 36B dense, 512K-token context, hybrid-mode reasoning with strong math and schema-adherent outputs. Surcharge reflects the Seed-base alignment scaffolding that fine-tuning must preserve.

49B — NVIDIA's instruction-tuned flagship

Llama-3.3-Nemotron-Super-49B (Studio, +$300 surcharge, Llama 3.3 license)

NVIDIA's instruction-tuned Llama with exceptional alignment and tool-calling reliability. The surcharge reflects the care required to preserve NVIDIA's alignment scaffolding during fine-tuning — disturb it carelessly and you lose what makes the model interesting.

Gemma 4 frontier — multimodal

Gemma 4 26B MoE (Studio, no surcharge, Apache 2.0)

Gemma 4 sparse MoE — #6 on the Arena AI text leaderboard among open models. Native vision + audio + 256K context. The multimodal choice in the 26B class.

Gemma 4 31B Dense (Studio, no surcharge, Apache 2.0)

Gemma 4 dense flagship — #3 on the Arena AI text leaderboard among open models. Multimodal, 256K context, 140+ languages. Pick over 26B MoE when you want pure text quality at the top of Gemma 4's capabilities without the MoE memory overhead.

70B–141B — production workhorses

Llama 3.3 70B Instruct (Atelier, no surcharge, Llama 3.3 license)

Frontier-class open weights — the production workhorse. The default answer for "I want a 70B I can trust in production." Included at Atelier base price with no surcharge.

Qwen 2.5 72B Instruct (Atelier, no surcharge, Qwen license)

Battle-tested 72B baseline, production-proven across many fine-tune projects. Pick this over Llama 3.3 70B if your task benefits from Qwen's training data (strong multilingual, strong math). Qwen license has a 100M MAU restriction — check if you're at that scale.

Hermes 3 Llama 3.3 70B (Atelier, no surcharge, Llama 3.3 license)

Production-grade Hermes 3 on Llama 3.3 70B. Fine-tune a model that already ships with strong function-calling, roleplay, and user-aligned assistant behavior. Pick over vanilla Llama 3.3 70B when your fine-tune would otherwise spend its capacity teaching assistant basics.

Mixtral 8x22B Instruct (Atelier, no surcharge, Apache 2.0)

141B total, 39B active per token. Production-proven for coding and reasoning — the canonical open MoE. Included at Atelier base price. The Mixtral 8x22B sweet spot: MoE efficiency with Apache 2.0 licensing.

Frontier MoE — 122B active-parameter class and up

The Atelier tier with surcharge. Multi-day training windows, 8× H200 class clusters, multi-node cases.

Qwen 3.5 122B-A10B MoE (Atelier, +$799 surcharge, Apache 2.0)

Qwen 3.5 Medium (Feb 2026). 122B total, 10B active. Production-ready MoE with strong agentic capability; faster and cheaper than the 397B flagship while still being frontier-class.

MiniMax-M2 230B-A10B MoE (Atelier, +$1,499 surcharge, MIT)

Agent-native frontier MoE — 230B total, 10B active. Optimized for coding and agentic tool-calling; open-sourced Oct 2025. Compact-and-fast alternative to M1 when agentic behavior is the dominant target.

Qwen3 235B-A22B MoE (Atelier, +$1,499 surcharge, Apache 2.0)

Qwen3's frontier MoE from mid-2025 — 235B total, 22B active per token. Surcharge covers 8× H200 minimum and ~12-hour training window.

Llama-3.1-Nemotron-Ultra-253B (Atelier, +$1,799 surcharge, Llama 3.1 license)

NVIDIA's flagship reasoning model — competitive with DeepSeek-R1 at superior throughput and memory efficiency. 253B dense — requires 8× H200 and careful gradient checkpointing for LoRA. Top of the Llama-Nemotron family.

Qwen 3.5 397B-A17B MoE (Atelier, +$2,499 surcharge, Apache 2.0)

Current Qwen flagship MoE — 397B total, 17B active. Native multimodal agent capabilities. Multi-day training potential for full LoRA convergence.

MiniMax-M1 456B-A45.9B MoE (Atelier, +$2,499 surcharge, Apache 2.0)

The first open-weight large-scale hybrid-attention reasoning MoE. 456B total, 45.9B active with lightning attention. Native 1M-token context (8× DeepSeek-R1). Multi-node H200 cluster with careful attention-kernel preservation during training.

GLM-5 744B MoE (Atelier, +$2,999 surcharge, MIT)

Z.ai's flagship MoE — 28.5T pre-training tokens, best-in-class open-source reasoning + coding + agentic performance. The base model that powers the GLM-5.1 series.

GLM-5.1 754B MoE (Atelier, +$3,499 surcharge, MIT)

Z.ai's top-of-leaderboard agentic coding model — #1 on SWE-Bench Pro (58.4), beating GPT-5.4 and Claude Opus 4.6. Optimized for long-horizon agentic engineering, NL2Repo, and Terminal-Bench tasks. The surcharge reflects the multi-day, often multi-node training required for full LoRA convergence.

Kimi K2.5 1.04T MoE (Atelier, +$4,499 surcharge, Modified MIT)

Trillion-parameter frontier MoE with vision and agent swarm modes (up to 100 sub-agents). Coding parity with GPT-5 and Gemini. Moonshot only ships the 1T variant; quantized inference variants (FP8, Q4 GGUF) are available via the Quantization Pack add-on. Multi-node H200 / B200 cluster required — longest training window in the catalog.

How to pick

The practical decision tree most teams end up at:

  1. "I want the safest production bet at 70B class" → Llama 3.3 70B (Atelier). If you need function-calling out of the box, Hermes 3 Llama 3.3 70B.
  2. "I want 70B class but multilingual or math-heavy" → Qwen 2.5 72B (Atelier).
  3. "I want strong reasoning at small size for inference cost reasons" → Phi-4 14B (Studio) or Qwen3 14B (Starter).
  4. "I need multimodal input (vision / audio)" → Gemma 4 4B, 26B MoE, or 31B Dense.
  5. "I need long-context (128K+)" → Gemma 4 family (256K), Qwen 3.5 family (256K), Hermes 4.3 36B (512K), or MiniMax-M1 (1M) at the frontier.
  6. "I want agentic coding at the frontier, cost is secondary" → GLM-5.1 or Kimi K2.5 (Atelier + surcharge). MiniMax-M2 is the agentic-efficiency alternative.
  7. "I'm prototyping, minimize spend" → Llama 3.1 8B or Qwen3 8B (both Starter).
  8. "My workload is reasoning-heavy and I want to fine-tune the best reasoning distillate" → DeepSeek-R1-Distill-Qwen-32B (Studio +$200).
  9. "I want a model with function-calling already trained in" → Hermes 3 Llama 3.1 8B (Starter) or Hermes 3 Llama 3.3 70B (Atelier).
  10. "I want the longest possible context at the frontier" → MiniMax-M1 (1M tokens, Atelier + $2,499).

Licensing notes

Most models in the catalog ship under permissive open-source licenses (Apache 2.0, MIT) that allow commercial use without meaningful restrictions. A few exceptions worth knowing:

  • Llama 3.1 / 3.3 licenses — permissive for most commercial use, but include an attribution requirement and restrict use by companies with over 700M monthly active users. Applies to vanilla Llama, the NVIDIA Nemotron variants, and the Hermes 3 Llama tunes.
  • Gemma license — Google's custom license; commercial use allowed but with a prohibited-use policy.
  • Qwen license (Qwen 2.5 72B specifically) — restricts use by companies with over 100M monthly active users.
  • Qwen Research license (Qwen 2.5 3B specifically) — check commercial-use terms before deploying.
  • Modified MIT (Kimi K2.5) — broad MIT with minor usage restrictions; check the full text if you're operating at scale.
  • Hermes 4.3 36B — ByteDance Seed base license plus Hermes community terms. Review both.

The Logos model selector surfaces the license per model. If you have licensing constraints, pre-filter by Apache 2.0 or MIT.

What Logos does with your chosen model

Once you pick — one of 45 models across 16 families, one of four tiers, any combination of add-ons — Logos does the training, ships the weights as .safetensors plus a LoRA adapter, and hands you an evaluation report. No hosting, no API, no lock-in. The weights are yours.

Open the Logos model selector to see pricing per base model →

For the decision between Logos and the big-name alternatives, see Logos vs OpenAI vs Together vs Replicate. For the product overview, What is Logos?.

Topics:open-weight-modelsfine-tuningllamaqwenmistralgemmaphi-4deepseeknemotronglmkimihermesminimaxmodel-selection

More from Logos