Motif-Technologies/Motif-Video-2B-GGUF
Motif-Technologies • videoMotif-Video-2B GGUF
GGUF quantized variants of Motif-Video-2B, a 2-billion parameter text-to-video diffusion transformer.
These files are intended for use with the diffusers library and allow you to run Motif-Video with reduced VRAM requirements by loading a quantized transformer while keeping the rest of the pipeline in the original precision.
Quality Comparison
Same prompt and seed across all variants (1280x736, 121 frames, 50 steps, NVIDIA H200). BF16 baseline at top, quantized variants paired below (4-bit → 8-bit). Each video is rendered at 1/2 resolution (640x368 per cell) at the original 24 fps.

Available Files
| File | Quantization | Size |
|---|---|---|
motifv-2b-dev-Q4_0.gguf | Q4_0 | 1.1G |
motifv-2b-dev-Q4_1.gguf | Q4_1 | 1.2G |
motifv-2b-dev-Q4_K_M.gguf | Q4_K_M | 1.1G |
motifv-2b-dev-Q5_0.gguf | Q5_0 | 1.3G |
motifv-2b-dev-Q5_1.gguf | Q5_1 | 1.4G |
motifv-2b-dev-Q5_K_M.gguf | Q5_K_M | 1.3G |
motifv-2b-dev-Q6_K.gguf | Q6_K | 1.6G |
motifv-2b-dev-Q8_0.gguf | Q8_0 | 2.0G |
motifv-2b-dev-BF16.gguf | BF16 | 3.7G |
Recommendation: Q5_K_M or Q6_K offer a good balance between quality and file size. Q8_0 is the closest to the original BF16 quality. Q4_K_M is the most memory-efficient option for constrained environments.
Installation
Prerequisites: PyTorch with CUDA support must be installed first. See pytorch.org for your CUDA version.
pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg gguf
pip install git+https://github.com/waitingcheung/diffusers.git@feat/motif-video
Usage
import torch
from diffusers import (
AdaptiveProjectedGuidance,
DPMSolverMultistepScheduler,
GGUFQuantizationConfig,
MotifVideoPipeline,
MotifVideoTransformer3DModel,
)
from diffusers.utils import export_to_video
from huggingface_hub import hf_hub_download
# DPMSolver++ subclass that ignores pipeline-supplied sigmas and builds its own flow-matching schedule.
class FlowDPMSolver(DPMSolverMultistepScheduler):
def set_timesteps(self, num_inference_steps=None, device=None,
sigmas=None, mu=None, timesteps=None):
if sigmas is not None and num_inference_steps is None:
num_inference_steps = len(sigmas)
super().set_timesteps(num_inference_steps=num_inference_steps, device=device, timesteps=timesteps)
guider = AdaptiveProjectedGuidance(
guidance_scale=8.0,
adaptive_projected_guidance_rescale=12.0,
adaptive_projected_guidance_momentum=0.1,
use_original_formulation=True,
normalization_dims="spatial",
)
variant = "Q4_K_M" # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16
ckpt_path = hf_hub_download(
"Motif-Technologies/Motif-Video-2B-GGUF",
filename=f"motifv-2b-dev-{variant}.gguf",
)
transformer = MotifVideoTransformer3DModel.from_single_file(
ckpt_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
config="Motif-Technologies/Motif-Video-2B",
revision="diffusers-integration",
subfolder="transformer",
torch_dtype=torch.bfloat16,
)
pipe = MotifVideoPipeline.from_pretrained(
"Motif-Technologies/Motif-Video-2B",
revision="diffusers-integration",
torch_dtype=torch.bfloat16,
guider=guider,
transformer=transformer,
)
# Replace default Euler scheduler with DPMSolver++ (flow matching).
flow_shift = 15.0 # bias sampling toward earlier (high-noise) sigmas.
pipe.scheduler = FlowDPMSolver(
num_train_timesteps=pipe.scheduler.config.get("num_train_timesteps", 1000),
algorithm_type="dpmsolver++",
solver_order=2,
prediction_type="flow_prediction",
use_flow_sigmas=True,
flow_shift=flow_shift,
)
pipe.enable_model_cpu_offload()
prompt = (
"A woman standing in a sunlit field as flower petals swirl around her in slow motion. "
"Each petal floats gently through the golden light, casting tiny shadows. "
"Her hair moves like water, and time seems to stand still."
)
negative_prompt = (
"text overlay, graphic overlay, watermark, logo, subtitles, timestamp, "
"broadcast graphics, UI elements, random letters, frozen pose, rigid, static expression, "
"jerky motion, mechanical motion, discontinuous motion, flat framing, depthless, dull lighting, "
"monotone, crushed shadows, blown-out highlights, shifting background, fading background, "
"poor continuity, identity drift, deformation, flickering, ghosting, smearing, duplication, "
"mutated proportions, inconsistent clothing, flat colors, desaturated, tonally compressed, "
"poor background separation, exposure shift, uneven brightness, color balance shift"
)
generator = torch.Generator(device="cuda").manual_seed(42)
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=736,
width=1280,
num_frames=121,
num_inference_steps=50,
generator=generator,
frame_rate=24,
use_linear_quadratic_schedule=False,
)
export_to_video(output.frames[0], "output.mp4", fps=24)
Benchmark
Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps:
| Variant | Speed (s/it) | Peak alloc (GB) | Peak rsv (GB) | Total (s) | VRAM saved vs BF16 (rsv) |
|---|---|---|---|---|---|
| BF16 | 23.22 | 14.78 | 24.93 | 1176.1 | — |
| Q8_0 | 23.24 | 13.10 | 23.14 | 1177.0 | 1.79 |
| Q6_K | 23.34 | 12.62 | 22.72 | 1181.7 | 2.21 |
| Q5_K_M | 23.37 | 12.39 | 22.45 | 1183.0 | 2.48 |
| Q5_1 | 23.35 | 12.47 | 22.66 | 1182.4 | 2.27 |
| Q5_0 | 23.35 | 12.37 | 22.55 | 1181.9 | 2.38 |
| Q4_K_M | 23.34 | 12.19 | 22.22 | 1181.5 | 2.71 |
| Q4_1 | 23.29 | 12.26 | 22.26 | 1179.2 | 2.67 |
| Q4_0 | 23.31 | 12.14 | 22.18 | 1179.8 | 2.75 |
- Peak alloc = peak GPU memory occupied by live tensors (model weights + activations), via
torch.cuda.max_memory_allocated. - Peak rsv = peak GPU memory reserved by PyTorch's caching allocator (alloc + cached free blocks), via
torch.cuda.max_memory_reserved. Use this as the effective VRAM footprint when planning headroom.
Key findings:
- Speed near-identical across all quantizations (23.4 s/it) — no dequantization overhead.
- VRAM savings scale with quant level: Q4 saves ~2.7 GB, Q8 saves ~1.8 GB (reserved).
Notes
- The non-transformer components (VAE, text encoder, scheduler) are loaded from the base model
Motif-Technologies/Motif-Video-2Batrevision="diffusers-integration"in BF16. - All inference is performed on CUDA. CPU inference is not supported.