Back to Models
KY

KyleHessling1/Qwopus-GLM-18B-Healed-MLX-4bit

KyleHessling1general

Qwopus-GLM-18B-Healed — MLX 4-bit

Apple Silicon / MLX 4-bit quantization of the healed Qwopus-GLM-18B frankenmerge. Ready to run on Macs with the MLX framework via mlx-lm.

Quickstart

pip install -U "mlx-lm>=0.31.2"
from mlx_lm import load, generate

model, tokenizer = load("KyleHessling1/Qwopus-GLM-18B-Healed-MLX-4bit")
print(generate(model, tokenizer, prompt="The capital of France is", max_tokens=64))

Or from the CLI:

python3 -m mlx_lm generate \
  --model KyleHessling1/Qwopus-GLM-18B-Healed-MLX-4bit \
  --prompt "Write a haiku about Apple Silicon." \
  --max-tokens 128

Runs comfortably on a 16–24 GB unified-memory Mac (M-series).

Quantization

PropertyValue
MethodMLX affine quantization (mlx_lm.convert -q)
Bits / weight4 (effective 4.502 after non-quantized layers)
Group size64
Non-quant dtypebfloat16
Output size~8.4 GB (2 safetensor shards)
Quantizer versionmlx-lm 0.31.2 / mlx 0.31.1

Reproducible from the BF16 source with:

python3 -m mlx_lm convert \
  --hf-path KyleHessling1/Qwopus-GLM-18B-Healed \
  --mlx-path ./Qwopus-GLM-18B-Healed-MLX-4bit \
  -q --q-bits 4 --q-group-size 64

Base Model

A 64-layer frankenmerge of two of Jackrong's Qwen3.5-9B finetunes, healed with a 1000-step QLoRA fine-tune:

Architecture

PropertyValue
Parameters~18B
Layers64 (32 + 32)
Hidden Size4096
Attention Heads16 (4 KV heads, GQA)
Attention TypeHybrid (linear + full, every 4th layer)
Context Length262,144 tokens
Source PrecisionBF16

Capability Suite (from base model)

Beats Qwen 3.6-35B-A3B MoE on a 44-test capability suite at less than half the VRAM:

Qwopus-GLM-18B (healed)Qwen 3.6-35B MoE
Score40/44 (90.9%)38/44 (86.4%)
Tool Calling6/66/6
Agentic4/44/4
Programming12/1512/15

Frontend stress tests: 62/63 checks passed across 6 complex HTML/CSS/JS generation tasks with perfectly balanced braces/parens and zero garbled output.

Note: benchmarks were measured on the BF16 base / Q4_K_M GGUF. The MLX 4-bit weights are a separate quantization and have not been independently re-benchmarked — expect quality within normal 4-bit quantization variance.

Known Issues

  • The tokenizer emits a Mistral-regex warning on load (inherited from the source repo). Benign for Qwen tokenization in practice.

Credits

All credit for the source models goes to Jackrong. The heal training used his published datasets. See the full merge documentation for the complete technical workflow.

MLX quantization by @KyleHessling1 using mlx-lm.

License

Apache 2.0 (inherited from source models)

Contact

Questions, issues, or cool projects? Reach out on X: @KyleHessling1

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes14
Downloads
📝

No reviews yet

Be the first to review KyleHessling1/Qwopus-GLM-18B-Healed-MLX-4bit!

Model Info

ProviderKyleHessling1
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes14
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor