Back to Models
SC

schuttdev/hipfire-qwen3.6-27b

schuttdevgeneral

Qwen3.6-27B for hipfire

Pre-quantized Qwen3.6-27B (DeltaNet hybrid) for hipfire, a Rust-native LLM inference engine for AMD RDNA GPUs.

Refresh of Qwen3.5-27B with newer training. Same architecture (DeltaNet + FullAttention hybrid, arch_id=5, 32 layers, 16 attention heads, 4 KV heads, head_dim=256), same kernel paths — no engine changes needed.

Files

FileRoleSizeMin VRAMRX 7900 XTX (gfx1100)
qwen3.6-27b.mq4target14.0 GB16 GB44 tok/s AR / 185 tok/s w/ draft on code
qwen36-27b-dflash-mq4.hfqDFlash draft0.92 GB(paired with target)

Decode tok/s is steady-state greedy decode on a 7900 XTX with asym3 KV.

Usage

# Install hipfire (Linux + ROCm 6+)
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

# Pull target + paired draft (DFlash speculative decode on by default)
hipfire pull qwen3.6:27b
hipfire pull qwen3.6:27b-draft

# Run
hipfire run qwen3.6:27b "Write a one-line Python function named square."

The engine auto-discovers the draft when both files are in ~/.hipfire/models/. Filename matters — do not rename qwen36-27b-dflash-mq4.hfq.

DFlash draft

DFlash is hipfire's speculative-decode path: a small auxiliary draft network proposes blocks of B candidate tokens that the target model verifies in a single batched forward pass. Acceptance ratio τ (committed tokens per cycle) is what determines wall-clock speedup; typical 27B τ on code prompts is 4-5.

The draft is converted from z-lab/Qwen3.6-27B-DFlash via hipfire's dflash_convert --mq4. The draft is a 1.73B-param hybrid (sliding_attention + full_attention) with block_size=16, target hidden extraction at layers [1, 16, 31, 46, 61], and mask_token_id=248070.

2026-04-27 refresh

Re-quantized from the latest z-lab safetensors revision (sha 0919688658996800f86b895034249700e9481106, upstream mtime 2026-04-27 04:19 UTC), replacing the prior Apr 24 conversion (sha 1dbb59a5...). Bench delta on 7900 XTX vs prior draft on qwen3.6-27b.mq4 target, single-run per genre:

genretok/s (old → new)τ (old → new)
code (fibonacci)101.13 → 105.324.40 → 4.64
prose (Roman empire)41.82 → 46.451.06 → 1.30
instruct (sky blue)85.13 ↔ 84.063.50 ↔ 3.44

Prose shows the largest gain (+11% tok/s, +23% τ). Code +4% tok/s. Instruct is tied within run-to-run noise.

Pairing rule

The draft only accelerates its target. Don't pair the 3.6 draft with the 3.5-27B target or vice versa — vocabulary and hidden-state projections differ across the refresh. Pair behavior:

  • hipfire pull qwen3.6:27b (alone) → AR decode, ~44 tok/s.
  • hipfire pull qwen3.6:27b-draft after target → DFlash on by default, ~185 tok/s on code prompts when draft+target alignment is good.

About hipfire

Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. Single binary install. Source: Kaden-Schutt/hipfire.

License

MIT for the hipfire packaging. Underlying weights inherit the upstream Qwen / z-lab licenses — see those repos for terms.

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes6
Downloads
📝

No reviews yet

Be the first to review schuttdev/hipfire-qwen3.6-27b!

Model Info

Providerschuttdev
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes6
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor