Back to Models
RI

rico03/Qwen3.6-35B-Opus-Reasoning-GGUF

rico03general

Qwen3.6 35B A3B — Opus 4.6 Reasoning Distillation (GGUF)

Fine-tuned version of Qwen/Qwen3.6-35B-A3B — one of the strongest open-weight agentic coding models — on high-quality reasoning traces distilled from Claude Opus 4.6 at maximum reasoning effort.

Qwen3.6-35B-A3B scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0, outperforming Gemma4-31B and Qwen3.5-35B on agentic coding tasks. This fine-tune adds structured Claude-style reasoning on top of an already exceptional base.

Why this model?

  • Base model is SOTA for agentic coding — Qwen3.6 beats Gemma4-31B on SWE-bench by 21 points
  • Reasoning distillation from Claude Opus 4.6 — learns explicit structured thinking before answering
  • MoE efficiency — only 3B parameters active at inference, fast on consumer hardware
  • Multiple quantizations — from 11GB (IQ2_M) to 25GB (Q5_K_M), imatrix-optimized
  • No degradation — fine-tuning preserves base model mathematical capabilities

Benchmark Results

⚠️ These are informal benchmarks run by the author, not official evaluations. Results on 30 samples may not be representative of full dataset performance.

GSM8K (Mathematical Reasoning, 30 samples, IQ4_NL quantization)

ModelCorrectAccuracyNotes
Qwen3.6-35B-A3B Base29/3096.7%Limited by 1024 token budget
Qwen3.6-35B-A3B Opus (this model)29/3096.7%Limited by 1024 token budget

Both models were limited by the 1024 token budget due to Qwen3.6's extended thinking mode.

The fine-tune is expected to improve on reasoning structure, multi-step planning and agentic coding tasks — areas not covered by GSM8K.

Available Quantizations

FileSizeVRAMTypeQuality
Qwen3.6-35B-A3B-Opus-IQ2_M.gguf11.66 GB16GB ✅imatrix★★★☆☆
Qwen3.6-35B-A3B-Opus-IQ3_XXS.gguf13.62 GB16GB ✅imatrix★★★★☆
Qwen3.6-35B-A3B-Opus-IQ3_M.gguf15.44 GB16GB ✅imatrix★★★★☆
Qwen3.6-35B-A3B-Opus-IQ4_XS.gguf18.73 GB24GBimatrix★★★★★
Qwen3.6-35B-A3B-Opus-IQ4_NL.gguf19.78 GB24GBimatrix★★★★★
Qwen3.6-35B-A3B-Opus-Q4_K_S.gguf19.89 GB24GBstandard★★★★☆
Qwen3.6-35B-A3B-Opus-Q5_K_M.gguf24.73 GB32GBstandard★★★★★

IQ quantizations use importance matrix calibration (groups_merged.txt) for superior quality at smaller file sizes compared to standard K-quants.

Which quantization should I use?

  • 16GB VRAMIQ3_M (best quality that fits) or IQ3_XXS (more context headroom)
  • 24GB VRAMIQ4_NL (near-lossless, recommended) or IQ4_XS (slightly smaller)
  • 32GB VRAMQ5_K_M (highest quality)

Training Details

  • Base model: Qwen/Qwen3.6-35B-A3B (35B total, 3B active, MoE, 256 experts)
  • Method: QLoRA (r=16, alpha=16, nf4 4-bit quantization)
  • Datasets:
  • Total examples: ~3046 reasoning traces from Claude Opus 4.6 at maximum reasoning effort
  • Epochs: 1
  • Final loss: ~0.64
  • Hardware: NVIDIA H100 NVL 94GB VRAM
  • Framework: HuggingFace TRL + PEFT
  • Quantization: llama.cpp with imatrix calibration (groups_merged.txt)

Usage with llama-server

llama-server \
  --model Qwen3.6-35B-A3B-Opus-IQ3_M.gguf \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn on \
  --cache-type-k q4_0 \
  --cache-type-v q4_0 \
  --jinja

Recommended Sampling Parameters

temperature: 1.0
top_p: 0.95
top_k: 20

What this fine-tune adds

  • Structured reasoning — explicit <think> blocks before answering, distilled from Claude Opus 4.6 style
  • Multi-step problem decomposition — breaks complex tasks into clear reasoning steps
  • Improved mathematical reasoning — more consistent step-by-step working
  • Better response consistency — more predictable output format
  • No capability degradation — base model performance preserved on standard benchmarks

Full precision model

The merged BF16 safetensors (~65GB) are available at rico03/qwen36-35B-opus-reasoning-merged for those who want to run their own quantizations, use with vLLM/Transformers, or perform further fine-tuning.

License

Apache 2.0 — same as base model Qwen/Qwen3.6-35B-A3B.

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes13
Downloads
📝

No reviews yet

Be the first to review rico03/Qwen3.6-35B-Opus-Reasoning-GGUF!

Model Info

Providerrico03
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes13
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor