Back to Models

nvidia/PixelDiT-1300M-1024px

nvidiaimage

PixelDiT: Pixel Diffusion Transformers for Image Generation

Yongsheng Yu1,2   Wei Xiong1†   Weili Nie1   Yichen Sheng1   Shiqiu Liu1   Jiebo Luo2

1NVIDIA   2University of Rochester
Project Lead and Main Advising

   

Key Features

  • VAE-free
  • Dual-level architecture: Patch-level DiT + Pixel-level DiT
  • MM-DiT text-image fusion: Joint attention between text and image tokens
  • Text encoder: Gemma-2-2B-IT
  • Multi-aspect-ratio: Supports various aspect ratios at 1024px

Usage

Installation

pip install -r requirements.txt

Inference

# See the full inference script at: https://github.com/NVlabs/PixelDiT
cd t2i/
python inference.py \
  --config configs/PixelDiT_1024px_pixel_diffusion_stage3.yaml \
  --model_path PixelDiT-T2I-v1.pth \
  --txt_file prompts.txt \
  --custom_height 1024 --custom_width 1024 \
  --cfg_scale 2.75 --seed 2025 \
  --negative_prompt "low quality, worst quality, over-saturated, blurry, deformed, watermark" \
  --work_dir "."

Inference Parameters

ParameterDefaultDescription
--cfg_scale3.5Classifier-free guidance scale
--step50Number of sampling steps (25 for fast, 50 for quality)
--seed0Random seed
--negative_prompt""Negative prompt for CFG
--interval_guidance[0, 1]CFG application interval
--sampling_algoflow_dpm-solverSampling algorithm

Model Architecture

ComponentValue
Parameters1.3B
Patch size16
Hidden size1536
Attention heads24
Patch-level depth14
Pixel-level depth2
Pixel hidden size16
Pixel attention hidden size1152
Text embedding dim2304
Text max length300
Text encoderGemma-2-2B-IT

Citation

@inproceedings{yu2026pixeldit,
      title={PixelDiT: Pixel Diffusion Transformers for Image Generation},
      author={Yongsheng Yu and Wei Xiong and Weili Nie and Yichen Sheng and Shiqiu Liu and Jiebo Luo},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2026},
}

License

This model is released under the NSCLv1 License. The work and any derivative works may only be used for non-commercial (research or evaluation) purposes.

Visit Website

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes14
Downloads
📝

No reviews yet

Be the first to review nvidia/PixelDiT-1300M-1024px!

Model Info

Providernvidia
Categoryimage
Reviews0
Avg. Rating / 5.0

Community

Likes14
Downloads

Rating Guidelines

★★★★★Exceptional
★★★★Great
★★★Good
★★Fair
Poor