Back to Models
RedHatAI logo

RedHatAI/gemma-4-31B-it-speculator.dflash

RedHatAIgeneral

RedHatAI/gemma-4-31B-it-speculator.dflash

This is a preliminary (and subject to change) DFlash speculator model for google/gemma-4-31B-it.

It was trained using the Speculators library on a combination of the Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered dataset and the train_sft split of the HuggingFaceH4/ultrachat_200k dataset. Training data used Magpie + UltraChat with responses from the gemma-4-31B-it model (no reasoning).

This model should be used with the google/gemma-4-31b-it chat template, specifically through the /chat/completions endpoint.

Note:

It was validated on Nvidia H100, other hardware validation pending.

We are continuing to train this model and will update with more evaluations and new weights in the future.

Deployment

Deploy with vLLM (main/nightly) using the speculator as a draft model.

First install vllm nightly
uv pip install -U vllm \
    --torch-backend=auto \
    --extra-index-url https://wheels.vllm.ai/nightly

Then run:

vllm serve -tp 2 RedHatAI/gemma-4-31B-it-speculator.dflash

It can also be deployed with a quantized verifier for even better speedups:

vllm serve RedHatAI/gemma-4-31B-it-FP8-block   --tensor-parallel-size 2  --speculative-config '{
    "model": "RedHatAI/gemma-4-31B-it-speculator.dflash",
    "num_speculative_tokens": 8,
    "method": "dflash"
  }'

Preliminary Evaluations

Evaluation command:

vllm bench serve --backend openai-chat --endpoint /v1/chat/completions \
  --dataset-name hf --tokenizer google/gemma-4-31B-it \
  --dataset-path "philschmid/mt-bench" --num-prompts 80 \
  --max-concurrency 1 --model RedHatAI/gemma-4-31B-it-speculator.dflash \
  --hf-output-len 2048 \
  --temperature 0 --save-result --save-detailed

Per-Position Acceptance Rate

DatasetPos 0Pos 1Pos 2Pos 3Pos 4Pos 5Pos 6Pos 7Avg. Length
HumanEval85.8%72.1%60.3%50.4%41.8%34.3%26.9%19.6%4.91
math_reasoning88.7%76.1%64.8%54.9%45.5%36.5%28.8%21.5%5.17
qa67.5%41%23.8%13.8%8.1%4.5%2.6%1.3%2.63
question75.1%51.1%34.7%24.5%17.9%13%9.4%6.5%3.32
rag76.1%54.8%39.8%28.7%19.9%12.9%7%3.8%3.43
summarization67.3%39.9%22.3%12%6.4%3.1%1.5%0.7%2.53
tool_call65.7%45.7%31.6%21.7%15%9.6%6.2%3.6%2.99
translation73.4%51.4%35.3%23.6%15.6%9.3%5.4%2.6%3.17
writing75.3%51.6%35.1%24.5%17.8%13%9.4%6.5%3.33
Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes22
Downloads
📝

No reviews yet

Be the first to review RedHatAI/gemma-4-31B-it-speculator.dflash!

Model Info

ProviderRedHatAI
Categorygeneral
Reviews0
Avg. Rating / 5.0

Community

Likes22
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor