Back to Models
Visit Website
KA
kaitchup/Qwen3.6-27B-autoround-nvfp4-linearattn-BF16
kaitchup • generalThis is Qwen/Qwen3.6-27B quantized with AutoRound to NVFP4 with linear-attention layers kept in 16-bit. The model is compatible with vLLM (tested: v0.19). Tested with an RTX Pro 6000. Currently under evaluation. Similar quantization with Qwen3.5 worked very well.
- Developed by: The Kaitchup
Instructions
uv pip install vllm
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID] --max-model-len 262144 --reasoning-parser qwen3