Back to Models
Visit Website
SP
sphaela/Qwen3.6-27B-AutoRound-GGUF
sphaela • generalQwen3.6-27B GGUF (AutoRound Quantized)
This repository contains GGUF quantized versions of Qwen/Qwen3.6-27B created using Intel's AutoRound quantization method.
Quantization Details
The models were quantized using various schemes provided by the auto-round tool. For better compatibility and smaller size, we provide unified multimodal projector (mmproj) files in F16, BF16, and F32 formats.
Files and Sizes
| File Name | Quant Type | Size | Description |
|---|---|---|---|
Qwen3.6-27B-Q2_K_S.gguf | Q2_K_S | 8.9 GB | Extremely high compression, significant quality loss. |
Qwen3.6-27B-Q2_K_MIXED.gguf | Q2_K_MIXED | 16 GB | Recommended high-compression option. Uses Q4 for KV cache with good quality. Fast inference. |
Qwen3.6-27B-Q3_K_S.gguf | Q3_K_S | 12 GB | Very high compression, notable quality loss. |
Qwen3.6-27B-Q3_K_M.gguf | Q3_K_M | 12 GB | Balanced 3-bit quantization. |
Qwen3.6-27B-Q3_K_L.gguf | Q3_K_L | 12 GB | High quality 3-bit quantization. |
Qwen3.6-27B-Q4_0.gguf | Q4_0 | 15 GB | Standard 4-bit quantization, good balance. |
Qwen3.6-27B-Q4_1.gguf | Q4_1 | 16 GB | Higher quality 4-bit quantization than Q4_0. |
Qwen3.6-27B-Q4_K_S.gguf | Q4_K_S | 15 GB | Small 4-bit K-quant, good efficiency. |
Qwen3.6-27B-Q4_K_M.gguf | Q4_K_M | 15 GB | Recommended 4-bit K-quant, excellent balance. |
Qwen3.6-27B-Q5_0.gguf | Q5_0 | 18 GB | Standard 5-bit quantization, very high quality. |
Qwen3.6-27B-Q5_1.gguf | Q5_1 | 19 GB | Higher quality 5-bit quantization than Q5_0. |
Qwen3.6-27B-Q5_K_S.gguf | Q5_K_S | 18 GB | Small 5-bit K-quant, very high quality. |
Qwen3.6-27B-Q5_K_M.gguf | Q5_K_M | 18 GB | Recommended 5-bit K-quant, near-lossless. |
Qwen3.6-27B-Q6_K.gguf | Q6_K | 21 GB | 6-bit K-quant, virtually indistinguishable from F16. |
Qwen3.6-27B-Q8_0.gguf | Q8_0 | 29 GB | 8-bit quantization, near-lossless. |
mmproj-model-f16.gguf | F16 | 885 MB | Unified Projector in Float16 format. |
mmproj-model-bf16.gguf | BF16 | 889 MB | Unified Projector in BFloat16 format. |
mmproj-model-f32.gguf | F32 | 1.8 GB | Unified Projector in Float32 format. |
Generate the Model
The models were generated using Intel's AutoRound with the following command:
auto-round --model Qwen/Qwen3.6-27B --output_dir ./quantized/ --scheme <SCHEME> --iters 0
Usage with llama.cpp
These models can be used with llama.cpp. For multimodal usage, you must specify the projector file:
./llama-cli -m Qwen3.6-27B-Q4_K_M.gguf --mmproj mmproj-model-f16.gguf --image your_image.jpg -p "Describe this image."
About AutoRound
AutoRound is an advanced quantization technique from Intel that aims to minimize accuracy loss through automated rounding optimization.