Back to Models
RI

rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

rico03 β€’ general

Qwen3.6-27B β€” Claude Opus Reasoning Distilled

Qwen3.6-27B fine-tuned with ~14k Claude 4.6 Opus reasoning traces β€” structured, efficient thinking for coding, math, and analytical tasks.

πŸ™ This model was trained following the methodology and pipeline guide by Jackrong, adapted for Qwen3.6-27B and extended with additional datasets and quantization options.

πŸ“¦ Looking for GGUF quantized versions (llama.cpp, Ollama)? β†’ rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF


🎯 Why This Model Exists

Qwen3.6-27B is one of the most capable open-weight 27B models ever released β€” it outperforms models 10Γ— its size on coding benchmarks and rivals closed frontier models. But raw capability alone isn't enough.

The base model has a known weakness: verbose, repetitive reasoning loops on straightforward queries. It over-thinks simple tasks and produces unnecessarily long chains of thought that hurt inference speed and readability.

This fine-tune addresses that directly by distilling the structured, efficient reasoning style of Claude 4.6 Opus into Qwen3.6-27B. The goal is not to change what the model knows, but how it thinks:

  • βœ… Structured <think>...</think> before every response
  • βœ… Concise reasoning on simple tasks, deep analysis on hard ones
  • βœ… Claude-style step-by-step decomposition
  • βœ… Reduced redundant cognitive loops
  • βœ… Preserved base model capabilities

🧠 Learned Reasoning Pattern

The model adopts a Claude-style structured reasoning scaffold:

<think>
Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
</think>

[Final Answer]

πŸ“Š Base Model Benchmarks (Qwen3.6-27B)

Qwen3.6-27B is the base model. These are its official benchmark results β€” the fine-tune inherits this capability while improving reasoning structure.

Benchmark Results

Language & Coding

BenchmarkQwen3.5-27BQwen3.5-397BGemma4-31BClaude 4.5 OpusQwen3.6-35B-A3BQwen3.6-27B
SWE-bench Verified75.076.252.080.973.477.2
SWE-bench Pro51.250.935.757.149.553.5
SWE-bench Multilingual69.369.351.777.567.271.3
Terminal-Bench 2.041.652.542.959.351.559.3
SkillsBench Avg27.230.023.645.328.748.2
LiveCodeBench v680.783.680.084.880.483.9

Knowledge & Reasoning

BenchmarkQwen3.5-27BQwen3.5-397BGemma4-31BClaude 4.5 OpusQwen3.6-35B-A3BQwen3.6-27B
MMLU-Pro86.187.885.289.585.286.2
MMLU-Redux93.294.993.795.693.393.5
GPQA Diamond85.588.484.387.086.087.8
AIME 202692.693.389.295.192.794.1
HMMT Feb 202684.387.977.285.383.684.3
HLE24.328.719.530.821.424.0

Source: Qwen3.6-27B official release

Fine-tuned Model

MetricValue
Train Loss (final)0.305
Training Duration4h 28min
Training HardwareNVIDIA RTX PRO 6000 Blackwell 96GB
MMLU-Pro (smoke test)coming soon

πŸ—ΊοΈ Training Pipeline

Base Model: Qwen/Qwen3.6-27B (27B dense, multimodal)
        β”‚
        β–Ό
4-bit quantized loading via Unsloth
        β”‚
        β–Ό
LoRA Rank-64 Adapter attached
(q_proj, k_proj, v_proj, o_proj,
 gate_proj, up_proj, down_proj, out_proj)
        β”‚
        β–Ό
SFT β€” Response-Only Training
Masked on: "<|im_start|>assistant\n<think>"
Chat template: qwen3-thinking
        β”‚
        β–Ό
rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

πŸ“š Datasets

DatasetSamplesPurpose
nohurry/Opus-4.6-Reasoning-3000x-filtered3,900Filtered high-quality Claude 4.6 Opus reasoning traces
Roman1111111/claude-opus-4.6-10000x9,633Large-scale Claude 4.6 Opus distillation data
Jackrong/Qwen3.5-reasoning-700x700Curated step-by-step reasoning, Qwen-specific

Total: ~14,233 examples after normalization, deduplication and length filtering (max 8192 tokens).


βš™οΈ Training Configuration

ParameterValue
Base modelQwen/Qwen3.6-27B
FrameworkUnsloth + TRL SFTTrainer
LoRA rank64
LoRA alpha64
Target modulesAll attention + MLP projections
Load precision4-bit (training)
Export precision16-bit merged
Effective batch size36 (2 Γ— 18 grad accum)
Learning rate2e-4
LR schedulerLinear
Epochs1
Max sequence length8192 tokens
OptimizerAdamW 8-bit
SupervisionResponse-only (assistant turns only)
Chat templateqwen3-thinking

πŸ’» Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Implement a binary search tree in Python with insert and search methods."}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))

vLLM

pip install vllm

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3

⚑ Speculative Decoding with MTP (vLLM)

Qwen3.6 supports Multi-Token Prediction (MTP) for significantly faster inference. Community tests show ~90% acceptance rate on this fine-tuned model β€” higher than typical, thanks to the structured reasoning training. Generation throughput reaches 60+ tok/s with MTP enabled, compared to ~25 tok/s standard.

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3 \
  --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

SGLang

python -m sglang.launch_server \
  --model-path rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --reasoning-parser qwen3

Recommended Sampling Parameters

Modetemperaturetop_ptop_kpresence_penalty
Thinking (general)1.00.95200.0
Thinking (coding)0.60.95200.0
Non-thinking0.70.80201.5

⚠️ Limitations

  • Text-only SFT: vision capabilities of the base model are not fine-tuned
  • 1 epoch: trained for 1 epoch on ~14k samples
  • Hallucination risk: autoregressive LLM β€” may produce incorrect facts
  • Intended use: coding, math, offline analytical tasks, logic-heavy prompting

πŸ“– Citation

@misc{rico03-qwen36-opus-reasoning,
  title  = {Qwen3.6-27B Claude Opus Reasoning Distilled},
  author = {rico03},
  year   = {2026},
  url    = {https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled}
}

@misc{qwen3.6-27b,
  title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
  author = {{Qwen Team}},
  month  = {April},
  year   = {2026},
  url    = {https://qwen.ai/blog?id=qwen3.6-27b}
}

πŸ™ Acknowledgements

  • Jackrong β€” fine-tuning guide and pipeline this work is based on
  • Unsloth β€” 2x faster fine-tuning with 70% less VRAM
  • Qwen Team β€” for releasing Qwen3.6-27B under Apache 2.0
  • All dataset contributors

Released for research and personal use. Not intended for production deployment without additional safety evaluation.

Visit Website
β€”

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes7
Downloadsβ€”
πŸ“

No reviews yet

Be the first to review rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled!

Model Info

Providerrico03
Categorygeneral
Reviews0
Avg. Ratingβ€” / 5.0

Community

Likes7
Downloadsβ€”

Rating Guidelines

β˜…β˜…β˜…β˜…β˜…Exceptional
β˜…β˜…β˜…β˜…Great
β˜…β˜…β˜…Good
β˜…β˜…Fair
β˜…Poor