DeepSeek-V4-Flash native FP4 / FP8 GGUF

Native, 1:1 conversion of deepseek-ai/DeepSeek-V4-Flash from the original safetensors into a single GGUF file that preserves the model's native low-precision weights:

Dense weights: FP8 E4M3 (F8_E4M3_B128, 128-element blocks with one E8M0 scale)
MoE expert weights: MXFP4 (MXFP4)

This file is not derived from a higher-precision intermediate; the FP4 and FP8 codes from the upstream checkpoint are written directly into the GGUF.

File

File	Size	Quant
`DeepSeek-V4-Flash-FP4-FP8-native.gguf`	~146 GB	F8_E4M3 + MXFP4

Loading

This GGUF requires a llama.cpp build with native F8_E4M3_B128 and MXFP4 support and the DeepSeek V4 Flash architecture. Stock upstream llama.cpp cannot load this file.

Reference (WIP) build that can both produce and run this GGUF:

https://github.com/nisparks/llama.cpp/tree/wip/deepseek-v4-support

That branch adds:

GGML_TYPE_F8_E4M3_B128 (ggml type 42)
LLAMA_FTYPE_MOSTLY_F8_E4M3_MXFP4 (ftype 41, exposed as F8_E4M3_MXFP4 / moe-f8-e4m3-mxfp4)
CUDA dequant / MMVQ kernels for F8_E4M3_B128
Loader / converter / gguf-py support
Custom DeepSeek V4 Flash model graph

The branch is an active WIP, expect rough edges.

Notes

DeepSeek V4 Flash is a custom architecture (MoE + sliding-window attention + compressor + indexer). The runtime in the reference branch implements that graph as a custom model path.
For matching activation behavior the runtime also applies HF's blockwise FP8 / FP4 fake-activation-quant on attention KV and indexer Q/KV after the Hadamard rotation.

Provenance

Upstream model: deepseek-ai/DeepSeek-V4-Flash

Conversion command:

python3 convert_hf_to_gguf.py /mnt/models/hf/DeepSeek-V4-Flash \
    --outtype moe-f8-e4m3-mxfp4 \
    --torch-threads 96 \
    --outfile DeepSeek-V4-Flash-FP4-FP8-native.gguf

(run from https://github.com/nisparks/llama.cpp/tree/wip/deepseek-v4-support)

License: Inherits the upstream DeepSeek V4 Flash license.

nsparks/DeepSeek-V4-Flash-FP4-FP8-GGUF

DeepSeek-V4-Flash native FP4 / FP8 GGUF

File

Loading

Notes

Provenance

No reviews yet

Model Info

Community

Rating Guidelines