Back to Models
prism-ml logo

prism-ml/Ternary-Bonsai-8B-mlx-2bit

prism-mlgeneral

Bonsai

Prism ML Website  |  White Paper  |  Demo & Examples  |  Discord

Ternary-Bonsai-8B-mlx-2bit

Ternary (1.58-bit) language model for Apple Silicon

7.1x smaller than FP16 | 5.2x faster on M4 Pro | 27 tok/s on iPhone | runs on Mac, iPhone, iPad

Highlights

  • 2.15 GiB (2.30 GB) packed 2-bit size (down from 16.38 GB FP16) — runs comfortably on any Mac or iPhone
  • Ternary weights {-1, 0, +1} across embeddings, attention projections, MLP projections, and LM head
  • 75.5 avg benchmark score across 6 categories — competitive with full-precision 8B models at 1/9th the size
  • 5-point improvement over our earlier 1-bit Bonsai 8B (70.5) at only ~0.6 GB additional footprint
  • MLX-native format with group size 128 and FP16 scaling

Pareto Frontier

Resources

  • White Paper
  • Demo repo — examples for serving, benchmarking, and integrating Bonsai
  • Discord — community support and updates
  • Kernels: MLX (Apple Silicon) · mlx-swift (iOS/macOS) — 2-bit format is supported out of the box

Model Overview

ItemSpecification
Base modelQwen3-8B
Parameters8.19B (~6.95B non-embedding)
ArchitectureGQA (32 query / 8 KV heads), SwiGLU MLP, RoPE, RMSNorm
Layers36 Transformer decoder blocks
Context length65,536 tokens
Vocab size151,936
Weight formatTernary g128: {-1, 0, +1} with FP16 group-wise scaling
Packed 2-bit size2.15 GiB (2.30 GB)
Ternary coverageEmbeddings, attention projections, MLP projections, LM head
LicenseApache 2.0

Quantization Format: Ternary g128

Each weight takes a value from {-1, 0, +1}, with one shared FP16 scale per group of 128 weights:

w_i = scale_g * t_i,    t_i in {-1, 0, +1}

The information-theoretic cost is log2(3) ≈ 1.585 bits per weight, plus FP16 group scales (16 bits per 128 weights), for a theoretical minimum of ~1.71 bits/weight. This release uses the MLX 2-bit format, which stores each ternary value in 2 bits plus group scales, for an effective ~2.125 bits/weight.

The addition of a zero value compared to binary (1-bit) provides more expressive weight representations, allowing better preservation of model quality under extreme compression.

Memory

FormatSizeReductionRatio
FP1616.38 GB--1.0x
MLX 2-bit g1282.15 GiB (2.30 GB)86.0%7.1x

Quickstart

MLX (Python)

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("prism-ml/Ternary-Bonsai-8B-mlx-2bit")

response = generate(
    model,
    tokenizer,
    prompt="Explain quantum computing in simple terms.",
    max_tokens=256,
)
print(response)

MLX Swift (iOS / macOS)

Ternary Bonsai 8B runs natively on iPhone and iPad via MLX Swift at 27 tok/s on iPhone 17 Pro Max. The 2-bit format is supported out of the box.

Throughput (MLX / Apple Silicon)

PlatformBackendPP512 (tok/s)TG128 (tok/s)FP16 TG (tok/s)Speedup
M4 Pro 48 GBMLX (Python)46083165.2x

iPhone 17 Pro Max (MLX Swift)

PlatformBackendPP512 (tok/s)TG128 (tok/s)4-bit TG (tok/s)Speedup
iPhone 17 Pro MaxMLX Swift36327141.9x

Benchmarks

Evaluated with EvalScope v1.4.2 + vLLM 0.15.1 on NVIDIA H100 under identical infrastructure, generation parameters, and scoring. All models are in the 6B-9B parameter range.

ModelSizeAvgMMLU-RMuSRGSM8KHE+IFEvalBFCL
Qwen 3 8B16.38 GB79.383559382.381.581
Ternary Bonsai 8B1.75 GB75.572.656.29177.481.873.9
1-bit Bonsai 8B (prior)1.15 GB70.565.7508873.879.865.7
RNJ 8B16.63 GB73.175.550.493.784.273.861.1
Ministral3 8B16.04 GB71.068.953.887.972.667.475.4
Olmo 3 7B14.60 GB70.97256.192.579.387.138.4

Ternary Bonsai 8B ranks 2nd among all compared models despite being 1/9th the size.

Intelligence Density

density = -ln(1 - score/100) / size_GB
ModelSizeIntelligence Density (1/GB)
Ternary Bonsai 8B1.75 GB0.803
1-bit Bonsai 8B (prior)1.15 GB1.062
Qwen 3 8B16.38 GB0.096
RNJ 8B16.62 GB0.079

Limitations

  • Only MLX 2-bit format is available at initial release; more formats for other backends coming soon
  • Mobile power measurement is estimated rather than hardware-metered
  • The full-precision frontier continues to advance; the ternary methodology is architecture-agnostic

Citation

@techreport{ternarybonsai,
    title   = {Ternary Bonsai: 1.58-bit Language Models at 8B, 4B, and 1.7B Scale},
    author  = {Prism ML},
    year    = {2026},
    month   = {April},
    url     = {https://prismml.com}
}

Contact

For questions, feedback, or collaboration inquiries: contact@prismml.com

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes90
Downloads
📝

No reviews yet

Be the first to review prism-ml/Ternary-Bonsai-8B-mlx-2bit!

Model Info

Providerprism-ml
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes90
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor