Back to Models
Visit Website
apple/CADD-Base-7B
apple • codeCADD-Base-7B
CADD-Base-7B is a masked diffusion language model for code generation, augmented with Continuously Augmented Discrete Diffusion (CADD) --- a continuous flow-matching signal that guides the discrete denoising process.
Key idea: At each diffusion step, a continuous embedding z_continuous is added to masked-token embeddings, following a linear flow-matching trajectory from noise to clean embeddings. This is orthogonal to the discrete unmasking strategy --- any MDM algorithm can be combined with CADD.
Usage
import torch
from transformers import AutoModel, AutoTokenizer
model_path = "apple/CADD-Base-7B"
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = model.to("cuda").eval()
prompt = "def fibonacci(n):\n"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output = model.diffusion_generate(
input_ids,
max_new_tokens=512,
steps=512,
temperature=0.1,
alg="entropy",
alg_temp=0.0,
use_cadd=True,
cadd_sampling_mode="weighted",
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
CADD Sampling Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
use_cadd | bool | True | Enable CADD continuous augmentation |
cadd_sampling_mode | str | "argmax" | How to estimate z_0 from logits: "weighted" or "argmax" |
alg | str | "origin" | Unmasking strategy: "entropy", "origin", "maskgit_plus", "topk_margin" |
temperature | float | 1.0 | Sampling temperature for token prediction |
steps | int | 512 | Number of diffusion steps |
More details:
- Paper: Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling (ICLR 2026)
- GitHub: https://github.com/apple/ml-CADD
Citation
@article{zheng2025continuously,
title={Continuously augmented discrete diffusion model for categorical generative modeling},
author={Zheng, Huangjie and Gong, Shansan and Zhang, Ruixiang and Chen, Tianrong and Gu, Jiatao and Zhou, Mingyuan and Jaitly, Navdeep and Zhang, Yizhe},
journal={arXiv preprint arXiv:2510.01329},
year={2025}
}
Acknowledgment
To power this HuggingFace model release, we build upon and improve DiffuCoder, reusing Dream's modeling architecture and generation utils.