Back to Models
dealignai logo

dealignai/MiniMax-M2.7-JANGTQ-CRACK

dealignai โ€ข code

๐Ÿ”ง 2026-04-15 ยท chat_template.jinja โ€” enable_thinking=False honored

This release ships a chat_template that respects enable_thinking=False (synced with the JANG_2L template structure, M2.7 identity preserved). The <think> prefix is now conditional, so callers can skip reasoning mode for fast direct answers. Reasoning ON (default) is unchanged.

If you cloned this repo before 2026-04-15, please re-download chat_template.jinja:

hf download dealignai/MiniMax-M2.7-JANGTQ-CRACK chat_template.jinja --local-dir /path/to/your/local/copy

Model weights are unchanged.


Important: This model uses the JANGTQ (JANG TurboQuant) quantization format -- an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on MoE experts while keeping attention at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.


MLX Studio

MLX Studio App

MLX Studio -- the only app that natively supports JANG / JANGTQ models


MiniMax M2.7 -- JANGTQ + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Reasoning-only | 55 GB

Ko-fi


What Is This?

This is MiniMax M2.7 -- a 230B parameter Mixture-of-Experts reasoning model with 256 experts (8 active per token), all standard attention, and always-on chain-of-thought reasoning.

It has been:

  1. JANGTQ quantized -- JANGTQ profile (8-bit affine attention / embeddings / lm_head, 2-bit TurboQuant routed experts with codebook + Hadamard rotation) -- 55 GB
  2. CRACK abliterated -- permanent weight-level removal of safety refusal
ArchitectureMiniMax M2.7 MoE -- 230B total, ~10B active, 256 experts
QuantizationJANGTQ (affine 8-bit attention + TurboQuant 2-bit experts) -- 55 GB
AbliterationCRACK abliterated
MMLU-20092.0% (base: 91.5%, delta: +0.5%)
HarmBench-32093.1% overall, 95.0% excluding copyright
ReasoningAlways ON (chain-of-thought), enable_thinking kwarg supported
Speed~47 tok/s (M3 Ultra 256 GB)
Fits on96 GB+ Macs

MMLU-200 Results

SubjectCRACKBaseDelta
College Physics20/20 (100%)18/20+2
High School Mathematics19/20 (95%)19/200
College Computer Science19/20 (95%)19/200
Astronomy19/20 (95%)20/20-1
High School Biology19/20 (95%)20/20-1
World Religions19/20 (95%)17/20+2
High School Chemistry18/20 (90%)18/200
Logical Fallacies18/20 (90%)16/20+2
Abstract Algebra17/20 (85%)19/20-2
Anatomy16/20 (80%)17/20-1
Total184/200 (92.0%)183/200 (91.5%)+0.5%

CRACK is knowledge-neutral on this model -- small per-subject variance, net positive. Gains on physics, fallacies, religions offset losses on algebra and anatomy.


HarmBench-320 Results

CategoryScore
Cybercrime / Intrusion51/5298.1%
Misinformation / Disinformation53/5498.1%
Chemical / Biological41/4297.6%
Harmful17/1894.4%
Illegal48/5390.6%
Copyright70/8087.5%
Harassment / Bullying18/2185.7%
Total298/32093.1%
Excluding copyright228/24095.0%

Scored with a strict classifier that rejects stuck-reasoning loops, empty template dumps, and false-positive compliance from thinking-trace leakage.


JANG CRACK M2.7 Series

ModelFormatSizeMMLUHarmBenchSpeedFits on
JANGTQ + CRACKTurboQuant 2-bit experts55 GB92.0%93.1%~47 t/s96 GB Mac
JANG_3L + CRACKAffine 3-bit mixed89 GB93.5%79.1%~46 t/s128 GB Mac

vs MLX Uniform Quantization

MLX uniform quantization is completely broken on MiniMax at ALL bit levels (~25% MMLU = random chance). JANG / JANGTQ is the only working quantization format for this architecture.


About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Attention / embeddings / lm_head stay at affine 8-bit for precision-critical paths; the 256 routed experts use 2-bit packed codebook indices stored as uint32, with a per-row float16 norm and a tiny Lloyd-Max codebook per layer. Metal kernels do dequant + matmul fused on-GPU, no affine conversion.

For MiniMax M2.7 (230B total, 10B active, 256 experts) this brings the model from 460 GB (bf16) down to 55 GB with minimal quality loss -- the JANGTQ profile beats every MLX uniform quant while fitting on a 96 GB Mac.

About CRACK

CRACK (Controlled Refusal Ablation via Calibrated Knockouts) removes safety alignment at the weight level by projecting refusal directions out of attention output matrices. Calibrated per-layer strengths preserve reasoning quality while achieving compliance.


Install & Usage

pip install "jang[mlx]"
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm import generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load_jangtq_model("dealignai/MiniMax-M2.7-JANGTQ-CRACK")
sampler = make_sampler(temp=1.0)  # MiniMax requires temp=1.0 for chat

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False)

response = generate(model, tokenizer, prompt=prompt, max_tokens=4000, sampler=sampler)
print(response)

Note: M2.7 is a reasoning-only model -- it always generates a <think> chain before the final answer. Use max_tokens=4000+ for complex questions. For chat, use temperature=1.0 (greedy causes infinite loops). Set enable_thinking=False in apply_chat_template to skip the <think> block on short responses.


Links

Ko-fi X/Twitter GitHub MLX Studio Website


Disclaimer

This model is provided for research and educational purposes. The creators are not responsible for any misuse. By downloading this model, you agree to use it responsibly and in compliance with applicable laws.


Created by Jinho Jang

Visit Website
โ€”

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes21
Downloadsโ€”
๐Ÿ“

No reviews yet

Be the first to review dealignai/MiniMax-M2.7-JANGTQ-CRACK!

Model Info

Providerdealignai
Categorycode
Reviews0
Avg. Ratingโ€” / 5.0

Community

Likes21
Downloadsโ€”

Rating Guidelines

โ˜…โ˜…โ˜…โ˜…โ˜…Exceptional
โ˜…โ˜…โ˜…โ˜…Great
โ˜…โ˜…โ˜…Good
โ˜…โ˜…Fair
โ˜…Poor