Back to Models
KH

khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled-GGUF

khazaraigeneral

Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled-GGUF : GGUF

alt="General Benchmark Comparison Chart"

  • Benchmark: khazarai/Multi-Domain-Reasoning-Benchmark
  • Total Questions: 100
ModelScore
khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled75.64
Qwen/Qwen3-4B-Thinking-250773.73

This is a reasoning-distilled variant of Qwen3-4B-Thinking, fine-tuned using LoRA via Unsloth to replicate the advanced reasoning capabilities of the larger Qwen3.6-plus teacher model. The distillation process focuses on reducing the "rambling" and "uncertainty" often found in smaller models during complex tasks, replacing them with concise, structured, and actionable solution paths.

Reasoning Comparison: Base vs. Distilled

The primary improvement in this model is the qualitative leap in reasoning structure. Below is a summary of the differences observed when solving complex graph problems (e.g., Shortest Path with Edge Reversals):

Base Model (Qwen3-4B-Thinking):

  • Style: Stream-of-consciousness, exploratory, and verbose.
  • Behavior: The model often talks to itself ("Hmm, interesting", "Wait, no"), struggles to interpret problem constraints correctly on the first try, and enters loops of self-correction. It mimics a student trying to figure out the problem as they speak.
  • Output: Contains high noise-to-signal ratio; solution paths are often buried under paragraphs of hesitation.

Distilled Model (Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled):

  • Style: Structured, professional, and report-oriented.
  • Behavior: The model analyzes the problem immediately, separates concerns (Input, Output, Constraints), and formulates a concrete algorithm plan (e.g., State-Space Dijkstra). It proceeds with confidence, avoiding logical dead-ends.
  • Output: Provides a clean breakdown: Problem Analysis -> Intuition -> Algorithm -> Complexity Analysis -> Pseudocode.

Verdict: The distilled model transforms the raw potential of the base model into an engineering-grade tool.

Model Specifications

  • Base Model: Qwen/Qwen3-4B-Thinking-2507
  • Model Type: Reasoning Distillation (QLoRA)
  • Framework: Unsloth
  • Fine-tuning Method: QLoRA (PEFT)
  • Teacher Model: Qwen3.6-plus
  • Distillation Dataset: khazarai/qwen3.6-plus-high-reasoning-500x
    • Total Tokens: 1,739,249
    • Max Sequence Length: 6,500 tokens

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

TypeSize/GBNotes
Q4_K_12.3
Q6_K3.3very good quality
Q8_04.2fast, best quality
bf168.016 bpw, overkill

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes36
Downloads
📝

No reviews yet

Be the first to review khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled-GGUF!

Model Info

Providerkhazarai
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes36
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor