Qwen3.6-27B-i1-IQ4_XS (Fully Optimized)

Motivation

Recent updates in the llama.cpp repository (specifically commit 1dab5f5a44) introduced a hardcoded minimum quantization of q5_K for attn_qkv layers. While this was likely intended to preserve model quality, it causes a noticeable bloat in the final file sizes.

For comparison, the highly efficient Qwen3.5-27B iq4_xs by mradermacher weighed in at 14.7GB, whereas the equivalent Qwen3.6 i1-GGUF under the new commit rules swelled to over 15.1GB.

Methodology

To restore the optimal balance of size and performance, I modified the llama.cpp source code to revert the quantization of attn_qkv layers back to a pure IQ4_XS format. This mirrors the exact 1:1 layer quantization strategy originally used in mradermacher's Qwen3.5-27B release.

This model was quantized utilizing the imatrix provided by mradermacher: Qwen3.6-27B-i1-GGUF.

Performance vs. Size Trade-off

Extensive perplexity testing (llama-perplexity with pg19.txt, 65k context, Q8_0 cache) confirms that forcing pure IQ4_XS across all layers results in a statistically insignificant intelligence drop (+0.0039 PPL) while noticeably reducing the memory footprint.

./llama-perplexity -m Qwen3.6-27B.i1-IQ4_XS.gguf -f pg19.txt -c 65536 --chunks 32 -ngl -1 -ctk q8_0 -ctv q8_0 -fa 1 -b 512 -ub 128

./llama-perplexity -m Qwen3.6-27B.i1-IQ4_XS-attn_qkv-IQ4_XS.gguf -f pg19.txt -c 65536 --chunks 32 -ngl -1 -ctk q8_0 -ctv q8_0 -fa 1 -b 512 -ub 128

🧠 Intelligence (Perplexity) Comparison

Model Version	Perplexity (PPL)	Difference / Quality Drop
Standard `IQ4_XS` (with q5_K attn_qkv)	7.3765 ± 0.02760	Baseline
Custom `IQ4_XS` (pure / fully iq4)	7.3804 ± 0.02762	+ 0.0039 (Negligible)

Conclusion: By utilizing this custom build, users save 375 MiB of active memory and reduce the static file size closer to the 14.7GB mark, with a practically non-existent impact on output quality (~0.05% PPL variance).

cHunter789/Qwen3.6-27B-i1-IQ4_XS-GGUF

Qwen3.6-27B-i1-IQ4_XS (Fully Optimized)

Motivation

Methodology

Performance vs. Size Trade-off

🧠 Intelligence (Perplexity) Comparison

No reviews yet

Model Info

Community

Rating Guidelines