Qwen3.6-27B — Claude Opus Reasoning Distilled

Name: rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
Brand: rico03
Rating: 0.0 (7 reviews)

Qwen3.6-27B fine-tuned with ~14k Claude 4.6 Opus reasoning traces — structured, efficient thinking for coding, math, and analytical tasks.

🙏 This model was trained following the methodology and pipeline guide by Jackrong, adapted for Qwen3.6-27B and extended with additional datasets and quantization options.

📦 Looking for GGUF quantized versions (llama.cpp, Ollama)? → rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF

🎯 Why This Model Exists

Qwen3.6-27B is one of the most capable open-weight 27B models ever released — it outperforms models 10× its size on coding benchmarks and rivals closed frontier models. But raw capability alone isn't enough.

The base model has a known weakness: verbose, repetitive reasoning loops on straightforward queries. It over-thinks simple tasks and produces unnecessarily long chains of thought that hurt inference speed and readability.

This fine-tune addresses that directly by distilling the structured, efficient reasoning style of Claude 4.6 Opus into Qwen3.6-27B. The goal is not to change what the model knows, but how it thinks:

✅ Structured <think>...</think> before every response
✅ Concise reasoning on simple tasks, deep analysis on hard ones
✅ Claude-style step-by-step decomposition
✅ Reduced redundant cognitive loops
✅ Preserved base model capabilities

🧠 Learned Reasoning Pattern

The model adopts a Claude-style structured reasoning scaffold:

<think>
Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
</think>

[Final Answer]

📊 Base Model Benchmarks (Qwen3.6-27B)

Qwen3.6-27B is the base model. These are its official benchmark results — the fine-tune inherits this capability while improving reasoning structure.

Benchmark Results

Language & Coding

Benchmark	Qwen3.5-27B	Qwen3.5-397B	Gemma4-31B	Claude 4.5 Opus	Qwen3.6-35B-A3B	Qwen3.6-27B
SWE-bench Verified	75.0	76.2	52.0	80.9	73.4	77.2
SWE-bench Pro	51.2	50.9	35.7	57.1	49.5	53.5
SWE-bench Multilingual	69.3	69.3	51.7	77.5	67.2	71.3
Terminal-Bench 2.0	41.6	52.5	42.9	59.3	51.5	59.3
SkillsBench Avg	27.2	30.0	23.6	45.3	28.7	48.2
LiveCodeBench v6	80.7	83.6	80.0	84.8	80.4	83.9

Knowledge & Reasoning

Benchmark	Qwen3.5-27B	Qwen3.5-397B	Gemma4-31B	Claude 4.5 Opus	Qwen3.6-35B-A3B	Qwen3.6-27B
MMLU-Pro	86.1	87.8	85.2	89.5	85.2	86.2
MMLU-Redux	93.2	94.9	93.7	95.6	93.3	93.5
GPQA Diamond	85.5	88.4	84.3	87.0	86.0	87.8
AIME 2026	92.6	93.3	89.2	95.1	92.7	94.1
HMMT Feb 2026	84.3	87.9	77.2	85.3	83.6	84.3
HLE	24.3	28.7	19.5	30.8	21.4	24.0

Source: Qwen3.6-27B official release

Fine-tuned Model

Metric	Value
Train Loss (final)	0.305
Training Duration	4h 28min
Training Hardware	NVIDIA RTX PRO 6000 Blackwell 96GB
MMLU-Pro (smoke test)	coming soon

🗺️ Training Pipeline

Base Model: Qwen/Qwen3.6-27B (27B dense, multimodal)
        │
        ▼
4-bit quantized loading via Unsloth
        │
        ▼
LoRA Rank-64 Adapter attached
(q_proj, k_proj, v_proj, o_proj,
 gate_proj, up_proj, down_proj, out_proj)
        │
        ▼
SFT — Response-Only Training
Masked on: "<|im_start|>assistant\n<think>"
Chat template: qwen3-thinking
        │
        ▼
rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

📚 Datasets

Dataset	Samples	Purpose
`nohurry/Opus-4.6-Reasoning-3000x-filtered`	3,900	Filtered high-quality Claude 4.6 Opus reasoning traces
`Roman1111111/claude-opus-4.6-10000x`	9,633	Large-scale Claude 4.6 Opus distillation data
`Jackrong/Qwen3.5-reasoning-700x`	700	Curated step-by-step reasoning, Qwen-specific

Total: ~14,233 examples after normalization, deduplication and length filtering (max 8192 tokens).

⚙️ Training Configuration

Parameter	Value
Base model	`Qwen/Qwen3.6-27B`
Framework	Unsloth + TRL SFTTrainer
LoRA rank	64
LoRA alpha	64
Target modules	All attention + MLP projections
Load precision	4-bit (training)
Export precision	16-bit merged
Effective batch size	36 (2 × 18 grad accum)
Learning rate	2e-4
LR scheduler	Linear
Epochs	1
Max sequence length	8192 tokens
Optimizer	AdamW 8-bit
Supervision	Response-only (assistant turns only)
Chat template	`qwen3-thinking`

💻 Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Implement a binary search tree in Python with insert and search methods."}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))

vLLM

pip install vllm

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3

⚡ Speculative Decoding with MTP (vLLM)

Qwen3.6 supports Multi-Token Prediction (MTP) for significantly faster inference. Community tests show ~90% acceptance rate on this fine-tuned model — higher than typical, thanks to the structured reasoning training. Generation throughput reaches 60+ tok/s with MTP enabled, compared to ~25 tok/s standard.

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3 \
  --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

SGLang

python -m sglang.launch_server \
  --model-path rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --reasoning-parser qwen3

Recommended Sampling Parameters

Mode	temperature	top_p	top_k	presence_penalty
Thinking (general)	1.0	0.95	20	0.0
Thinking (coding)	0.6	0.95	20	0.0
Non-thinking	0.7	0.80	20	1.5

⚠️ Limitations

Text-only SFT: vision capabilities of the base model are not fine-tuned
1 epoch: trained for 1 epoch on ~14k samples
Hallucination risk: autoregressive LLM — may produce incorrect facts
Intended use: coding, math, offline analytical tasks, logic-heavy prompting

📖 Citation

@misc{rico03-qwen36-opus-reasoning,
  title  = {Qwen3.6-27B Claude Opus Reasoning Distilled},
  author = {rico03},
  year   = {2026},
  url    = {https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled}
}

@misc{qwen3.6-27b,
  title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
  author = {{Qwen Team}},
  month  = {April},
  year   = {2026},
  url    = {https://qwen.ai/blog?id=qwen3.6-27b}
}

🙏 Acknowledgements

Jackrong — fine-tuning guide and pipeline this work is based on
Unsloth — 2x faster fine-tuning with 70% less VRAM
Qwen Team — for releasing Qwen3.6-27B under Apache 2.0
All dataset contributors

Released for research and personal use. Not intended for production deployment without additional safety evaluation.

rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled