Back to Models
MA

ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF

ManniX-ITAimage

Qwen3.6-27B-Omnimerge-v4-GGUF

GGUF quantizations of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 — the MLP-passthrough variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 → IQ2_XXS) for llama.cpp.

Source model: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 (BF16 weights, model card with full benchmarks and methodology). NOT a quant of clean Qwen/Qwen3.6-27B — these GGUFs contain the v4 merge.

All quants made using imatrix with calibration data v5, the same calibration set bartowski uses for the Qwen3.6 base release — so quality fingerprints are directly comparable to bartowski's Qwen_Qwen3.6-27B-GGUF repo.

Why this merge exists

Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 on the newer Qwen3.6 base, with mlp.{gate,up,down}_proj copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the v4 model card for the full story, scripts, and benchmark methodology.

Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2)

All scored under identical llama.cpp + lm_eval conditions (--reasoning-format deepseek --reasoning-budget 8192 --parallel 2, raw /v1/completions, no chat template).

BenchmarkQwen3.6 base Q6_K (bartowski)Omnimerge-v2 (Qwen3.5 base)Omnimerge-v4-MLP (this)Δ vs baseΔ vs v2
HumanEval pass@1 (164q)84.76%79.27%84.76%0.00 pp+5.49 pp
MBPP pass@1 (500q) — corrected*57.60%74.60%73.40%+15.80 pp−1.20 pp
GPQA Diamond pass@1 (flex)not measured69.19% (full 198q)≈ 84.75% (partial 177q‡)≈ +15.5 pp

* MBPP scores are post-<think>-stripping (lm_eval's raw scorer SyntaxErrors on literal < in exec(prompt+completion+tests)). See the v4 model card for the per-model recovery breakdown. ‡ GPQA crashed on the at-budget reasoning tail (aiohttp lifecycle bug in lm_eval); 192/198 cached, 177 matched, headline expected to land in the 82-86% band.

Available Quantizations

All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. imatrix.dat (used for every quant) is in the repo root for audit and reproduction.

QuantizationFile sizeUse case
F16 (full precision)50.11 GBConversion source / lossless reference
Q8_026.63 GBHighest fidelity, large
Q6_K_L21.14 GBQ6_K with embed/output at Q8_0
Q6_K20.57 GBRecommended high tier — eval methodology used this
Q5_K_L18.64 GBQ5_K_M with embed/output at Q8_0
Q5_K_M17.91 GBStrong fidelity, balanced
Q5_K_S17.40 GBSlightly smaller K-mix
Q4_K_L16.29 GBQ4_K_M with embed/output at Q8_0
Q4_115.91 GBLegacy 4-bit, dense
Q4_K_M15.41 GBRecommended balanced tier for most users
IQ4_NL14.72 GBImportance-aware 4-bit non-linear
Q4_K_S14.52 GBK-mix small variant
Q4_014.41 GBLegacy 4-bit
IQ4_XS14.05 GBIQ4 extra-small
Q3_K_XL13.42 GBQ3_K_L with embed/output at Q8_0
Q3_K_L13.36 GB3-bit K-mix large
Q3_K_M12.39 GB3-bit K-mix medium
IQ3_M11.72 GBImportance-aware 3-bit medium
Q3_K_S11.24 GB3-bit K-mix small
IQ3_XS11.15 GBIQ3 extra-small
Q2_K_L11.13 GBQ2_K with embed/output at Q8_0
IQ3_XXS10.42 GBIQ3 extra-extra-small
Q2_K9.98 GB2-bit K-mix
IQ2_M9.32 GBImportance-aware 2-bit medium
IQ2_S8.72 GBIQ2 small
IQ2_XS8.47 GBIQ2 extra-small
IQ2_XXS7.85 GBIQ2 extra-extra-small (smallest)

How to Use

With llama.cpp:

# Recommended args for reasoning-tag-emitting models (matches the eval methodology):
llama-server \
    -m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \
    -c 32768 -ngl 99 -t 12 --no-warmup \
    --reasoning-format deepseek --reasoning-budget 8192

Swap Q4_K_M for any tier from the table above. Q6_K matches the methodology used in our published evals; Q4_K_M is the typical "balanced" choice for most users.

For multimodal (vision) inference: the mmproj projector is in bartowski/Qwen_Qwen3.6-27B-GGUF and works with this model unchanged (vision tower is preserved verbatim from the base).

With ollama: use a Modelfile pointing to one of the GGUFs above, or HF direct load.

imatrix.dat

The imatrix.dat (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable.

Reproducing

See scripts/ on the source v4 model repo:

  • dare_ties_merge.py — main merger (auto-detects Qwen3.6 base via output_gate_type and applies MLP-skip)
  • v4_mlp_passthrough.py — post-process: rebuild merged dir with MLP layers from base
  • quantize_gguf.py — the script that built this repo

For dense (non-Gemma-4-MoE) models, pass --exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps).

License

Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources).

Acknowledgements

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes5
Downloads
📝

No reviews yet

Be the first to review ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF!

Model Info

ProviderManniX-ITA
Categoryimage
Reviews0
Avg. Rating / 5.0

Community

Likes5
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor