Motif-Video-2B GGUF

GGUF quantized variants of Motif-Video-2B, a 2-billion parameter text-to-video diffusion transformer.

These files are intended for use with the diffusers library and allow you to run Motif-Video with reduced VRAM requirements by loading a quantized transformer while keeping the rest of the pipeline in the original precision.

Quality Comparison

Same prompt and seed across all variants (1280x736, 121 frames, 50 steps, NVIDIA H200). BF16 baseline at top, quantized variants paired below (4-bit → 8-bit). Each video is rendered at 1/2 resolution (640x368 per cell) at the original 24 fps.

BF16 Q4_0 / Q4_1 Q4_K_M / Q5_0 Q5_1 / Q5_K_M Q6_K / Q8_0

Available Files

File	Quantization	Size
`motifv-2b-dev-Q4_0.gguf`	Q4_0	1.1G
`motifv-2b-dev-Q4_1.gguf`	Q4_1	1.2G
`motifv-2b-dev-Q4_K_M.gguf`	Q4_K_M	1.1G
`motifv-2b-dev-Q5_0.gguf`	Q5_0	1.3G
`motifv-2b-dev-Q5_1.gguf`	Q5_1	1.4G
`motifv-2b-dev-Q5_K_M.gguf`	Q5_K_M	1.3G
`motifv-2b-dev-Q6_K.gguf`	Q6_K	1.6G
`motifv-2b-dev-Q8_0.gguf`	Q8_0	2.0G
`motifv-2b-dev-BF16.gguf`	BF16	3.7G

Recommendation: Q5_K_M or Q6_K offer a good balance between quality and file size. Q8_0 is the closest to the original BF16 quality. Q4_K_M is the most memory-efficient option for constrained environments.

Installation

Prerequisites: PyTorch with CUDA support must be installed first. See pytorch.org for your CUDA version.

pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg gguf
pip install git+https://github.com/waitingcheung/diffusers.git@feat/motif-video

Usage

import torch
from diffusers import (
    AdaptiveProjectedGuidance,
    DPMSolverMultistepScheduler,
    GGUFQuantizationConfig,
    MotifVideoPipeline,
    MotifVideoTransformer3DModel,
)
from diffusers.utils import export_to_video
from huggingface_hub import hf_hub_download


# DPMSolver++ subclass that ignores pipeline-supplied sigmas and builds its own flow-matching schedule.
class FlowDPMSolver(DPMSolverMultistepScheduler):
    def set_timesteps(self, num_inference_steps=None, device=None,
                      sigmas=None, mu=None, timesteps=None):
        if sigmas is not None and num_inference_steps is None:
            num_inference_steps = len(sigmas)
        super().set_timesteps(num_inference_steps=num_inference_steps, device=device, timesteps=timesteps)


guider = AdaptiveProjectedGuidance(
    guidance_scale=8.0,
    adaptive_projected_guidance_rescale=12.0,
    adaptive_projected_guidance_momentum=0.1,
    use_original_formulation=True,
    normalization_dims="spatial",
)

variant = "Q4_K_M"  # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16

ckpt_path = hf_hub_download(
    "Motif-Technologies/Motif-Video-2B-GGUF",
    filename=f"motifv-2b-dev-{variant}.gguf",
)
transformer = MotifVideoTransformer3DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    config="Motif-Technologies/Motif-Video-2B",
    revision="diffusers-integration",
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
)

pipe = MotifVideoPipeline.from_pretrained(
    "Motif-Technologies/Motif-Video-2B",
    revision="diffusers-integration",
    torch_dtype=torch.bfloat16,
    guider=guider,
    transformer=transformer,
)

# Replace default Euler scheduler with DPMSolver++ (flow matching).
flow_shift = 15.0  # bias sampling toward earlier (high-noise) sigmas.
pipe.scheduler = FlowDPMSolver(
    num_train_timesteps=pipe.scheduler.config.get("num_train_timesteps", 1000),
    algorithm_type="dpmsolver++",
    solver_order=2,
    prediction_type="flow_prediction",
    use_flow_sigmas=True,
    flow_shift=flow_shift,
)

pipe.enable_model_cpu_offload()

prompt = (
    "A woman standing in a sunlit field as flower petals swirl around her in slow motion. "
    "Each petal floats gently through the golden light, casting tiny shadows. "
    "Her hair moves like water, and time seems to stand still."
)
negative_prompt = (
    "text overlay, graphic overlay, watermark, logo, subtitles, timestamp, "
    "broadcast graphics, UI elements, random letters, frozen pose, rigid, static expression, "
    "jerky motion, mechanical motion, discontinuous motion, flat framing, depthless, dull lighting, "
    "monotone, crushed shadows, blown-out highlights, shifting background, fading background, "
    "poor continuity, identity drift, deformation, flickering, ghosting, smearing, duplication, "
    "mutated proportions, inconsistent clothing, flat colors, desaturated, tonally compressed, "
    "poor background separation, exposure shift, uneven brightness, color balance shift"
)

generator = torch.Generator(device="cuda").manual_seed(42)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=736,
    width=1280,
    num_frames=121,
    num_inference_steps=50,
    generator=generator,
    frame_rate=24,
    use_linear_quadratic_schedule=False,
)
export_to_video(output.frames[0], "output.mp4", fps=24)

Benchmark

Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps:

Variant	Speed (s/it)	Peak alloc (GB)	Peak rsv (GB)	Total (s)	VRAM saved vs BF16 (rsv)
BF16	23.22	14.78	24.93	1176.1	—
Q8_0	23.24	13.10	23.14	1177.0	1.79
Q6_K	23.34	12.62	22.72	1181.7	2.21
Q5_K_M	23.37	12.39	22.45	1183.0	2.48
Q5_1	23.35	12.47	22.66	1182.4	2.27
Q5_0	23.35	12.37	22.55	1181.9	2.38
Q4_K_M	23.34	12.19	22.22	1181.5	2.71
Q4_1	23.29	12.26	22.26	1179.2	2.67
Q4_0	23.31	12.14	22.18	1179.8	2.75

Peak alloc = peak GPU memory occupied by live tensors (model weights + activations), via torch.cuda.max_memory_allocated.
Peak rsv = peak GPU memory reserved by PyTorch's caching allocator (alloc + cached free blocks), via torch.cuda.max_memory_reserved. Use this as the effective VRAM footprint when planning headroom.

Key findings:

Speed near-identical across all quantizations (23.4 s/it) — no dequantization overhead.
VRAM savings scale with quant level: Q4 saves ~2.7 GB, Q8 saves ~1.8 GB (reserved).

Notes

The non-transformer components (VAE, text encoder, scheduler) are loaded from the base model Motif-Technologies/Motif-Video-2B at revision="diffusers-integration" in BF16.
All inference is performed on CUDA. CPU inference is not supported.

Motif-Technologies/Motif-Video-2B-GGUF

Motif-Video-2B GGUF

Quality Comparison

Available Files

Installation

Usage

Benchmark

Notes

No reviews yet

Model Info

Community

Rating Guidelines