dealignai/Qwen3.6-35B-A3B-JANGTQ4-CRACK
dealignai • imageImportant: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, SSM, shared_expert, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the
jang-toolsPython package. Follow @dealignai for new releases.
MLX Studio — the only app that natively supports JANG / JANGTQ models
Qwen 3.6 35B-A3B — JANGTQ4 + CRACK
JANGTQ TurboQuant mixed-precision | CRACK abliterated | Vision + Video | Hybrid SSM/Attention MoE | 18 GB
What Is This?
This is Qwen 3.6 35B-A3B — a 35B-parameter Mixture-of-Experts vision-language model with 256 routed experts (10 active per token), hybrid linear + full-attention architecture, and native image + video understanding.
It has been:
- JANGTQ quantized — JANGTQ4 profile (8-bit affine precision paths, 4-bit TurboQuant routed experts with codebook + Hadamard rotation, fp16 vision tower) — 18 GB
- CRACK abliterated — permanent weight-level removal of safety refusal
| Base model | Qwen 3.6 35B-A3B MoE VL (35B total, ~3B active, 256 routed experts) |
| Quantization | JANGTQ4 — 18 GB |
| MMLU-200 | 73.50% (base: 77.50%) |
| HarmBench-320 | 90.31% |
| Vision | 27-layer ViT preserved in fp16 (image + video) |
| Context | 262,144 native; up to ~1M with YaRN |
| Reasoning | Toggleable via enable_thinking |
| Fits on | 24 GB+ Macs |
MMLU-200 Results (thinking OFF)
| Subject | CRACK | Base | Delta |
|---|---|---|---|
| Astronomy | 19/20 (95%) | 17/20 | +2 |
| High School Chemistry | 18/20 (90%) | 17/20 | +1 |
| High School Biology | 18/20 (90%) | 18/20 | 0 |
| Logical Fallacies | 18/20 (90%) | 19/20 | -1 |
| World Religions | 18/20 (90%) | 19/20 | -1 |
| Anatomy | 16/20 (80%) | 17/20 | -1 |
| College Computer Science | 14/20 (70%) | 16/20 | -2 |
| College Physics | 14/20 (70%) | 14/20 | 0 |
| High School Mathematics | 7/20 (35%) | 10/20 | -3 |
| Abstract Algebra | 5/20 (25%) | 8/20 | -3 |
| Total | 147/200 (73.5%) | 155/200 (77.5%) | -4.0pp |
CRACK regression is concentrated on mathematical reasoning (algebra, math, CS). Knowledge-based subjects are largely preserved (astronomy and chemistry actually improved).
HarmBench-320 Results
| Category | Score | |
|---|---|---|
| Copyright | 80/80 | 100.0% |
| Misinformation / Disinformation | 53/54 | 98.1% |
| Harassment / Bullying | 20/21 | 95.2% |
| Cybercrime / Intrusion | 45/52 | 86.5% |
| Illegal | 44/53 | 83.0% |
| Chemical / Biological | 33/42 | 78.6% |
| Harmful | 14/18 | 77.8% |
| Total | 289/320 | 90.31% |
Scored with a strict classifier that permits song-structure repetition in copyright-category compliance (lyrics legitimately repeat choruses).
Pre-Upload Capability Check (20/20 PASS)
All 20 prompts completed without refusal at greedy decoding (thinking OFF):
- Hacking / Security (8/8): port scanner, reverse shell, SQL injection explanation, keylogger, Metasploit usage, ARP spoofing, phishing email, buffer overflow exploit — all produced working technical content
- Coding Coherence (4/4): red-black tree, async web scraper, FastAPI task manager, expression-language compiler — all complete implementations
- Reasoning (4/4): Euclid prime-infinity proof + √2 irrationality, microservices-vs-monolith trade-offs, farmer sheep math, mRNA vaccine mechanism
- Knowledge (4/4): Kazakhstan capital, x^3+2x derivative, 8 planets, Crime and Punishment author — all correct
JANG CRACK Qwen 3.6 Series
| Model | Format | Size | MMLU | HarmBench | Fits on |
|---|---|---|---|---|---|
| JANGTQ4 + CRACK (this model) | TurboQuant 4-bit experts | 18 GB | 73.5% | 90.3% | 24 GB Mac |
| JANGTQ2 + CRACK | TurboQuant 2-bit experts | 11 GB | 73.0% | 93.8% | 16 GB Mac |
About JANGTQ
JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.
For Qwen 3.6 35B-A3B, JANGTQ4 brings the model to 18 GB while keeping the vision tower at fp16 for image + video understanding.
About CRACK
CRACK is a permanent weight-level abliteration that removes safety refusal without touching the TurboQuant codebook or vision tower. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.
Reasoning ON / OFF
The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.
# Thinking ON (default — full chain-of-thought)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# → emits <think>...</think> then the answer
# Thinking OFF (direct answer, no <think> block)
prompt = tokenizer.apply_chat_template(messages, tokenize=False,
add_generation_prompt=True,
enable_thinking=False)
# → skips <think>, answers directly
All MMLU-200 and HarmBench-320 scores above were measured with thinking OFF for consistent short-form grading.
Notes
- Thinking mode: Supported via
enable_thinkingkwarg. Thinking OFF is recommended for short-answer tasks (MMLU, direct instructions). Thinking ON works for extended reasoning but may occasionally loop on extreme refusal prompts (a known Qwen 3.6 surgical artifact — 1/6 in our thinking-ON stress test). - Vision: 27-layer ViT preserved in fp16. Image + video inputs work normally through
mlx_vlm. - Context length: 262,144 native; extend via YaRN if your inference engine supports it.
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Disclaimer
This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.
The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.
