Back to Models
TH

Thump604/DeepSeek-V4-Flash-MLX-Q2-mixed-gs128-affine

Thump604general

DeepSeek V4 Flash MLX Q2 Mixed

This is an MLX conversion of deepseek-ai/DeepSeek-V4-Flash.

Source

  • Base model: deepseek-ai/DeepSeek-V4-Flash
  • Source revision: 6e763230a9d263eca2023f1d4a5ce1bfe126cf48
  • Architecture: DeepseekV4ForCausalLM
  • Model type: deepseek_v4

Conversion Recipe

  • Tooling branch: Thump604/mlx-lm, branch deepseek-v4-support-fixes
  • Minimum tooling commit for generation: 9c990f4
  • Output path during conversion: /Volumes/Lexar/mlx_models/DeepSeek-V4-Flash-MLX-Q2-mixed-gs128-affine
  • Quantization recipe: mixed_2_6
  • Quantization mode: affine
  • Group size: 128
  • Effective bits per weight reported by MLX: 2.992
  • Shards: 23
  • Indexed MLX tensor size: 106,355,393,628 bytes

The mixed recipe uses 2-bit affine quantization for lower-risk routed expert paths and 6-bit affine quantization for sensitive paths including embeddings, LM head, attention projections, compressed-attention/indexer components, shared experts, and selected down projections.

Validation

  • Conversion completed successfully.
  • Lazy MLX load completed successfully on a 128GB Mac Studio.
  • Raw prompt generation smoke completed successfully with --max-tokens 2 --max-kv-size 1024.
  • Observed smoke numbers: 54.59s real time, 74.5GB max RSS, 106.94GB peak footprint, zero swaps.

This artifact is a low-bit local fallback. It is not quality-qualified for production writing or coding lanes. Treat quality, long-context behavior, and sparse compressed-attention parity as open until evaluated with a real task suite.

Notes

DeepSeek V4 support in MLX is still under active development. This artifact was produced with local DeepSeek V4 support fixes, including FP4/FP8 checkpoint handling, F8_E8M0 scale metadata reinterpretation as raw uint8 exponent bytes before sanitizer decode, attention sink dtype handling, and quantized grouped output projection support.

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes7
Downloads
📝

No reviews yet

Be the first to review Thump604/DeepSeek-V4-Flash-MLX-Q2-mixed-gs128-affine!

Model Info

ProviderThump604
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes7
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor