Open-dLLM: Open Diffusion Large Language Model

Fred Zhangzhi Peng, Shuibai Zhang, Alex Tong

Duke University, UW-Madison, Aithyra

TL;DR: the first/most open release diffusion large language model.

Open-dLLM

The field of diffusion LLMs is still young, and the answers aren’t clear. Pioneering efforts like Gemini-Diffusion, Seed Diffusion, and Mercury sparked excitement, but they remain closed APIs. You can use them, but you can’t study how they were built.

Open projects like LLaDA and Dream pushed things further by releasing weights and inference code. But they stopped short of what researchers need most: training pipelines, data recipes, and reproducible evaluation.

That’s why we built Open-dLLM: the first full-stack open diffusion language model project.

👉 Our first release is Open-dCoder, focused on code generation. It includes:

🏋️ Pretraining pipeline + data
⚡ Inference code
📊 Evaluation suite (HumanEval, MBPP, infilling benchmarks)
📦 Checkpoints on Hugging Face

With Open-dLLM, you can go from raw data → training → checkpoints → evaluation → inference, all in one repo.

Project	Data	Training Code	Inference	Evaluation	Weights
Open-dLLM (ours)	✅	✅	✅	✅	✅
LLaDA	❌	❌	✅	⚠️ limited	✅
Dream	❌	❌	✅	⚠️ limited	✅
Gemini-Diffusion	❌	❌	❌	❌	❌ (API only)
Seed Diffusion	❌	❌	❌	❌	❌ (API only)
Mercury	❌	❌	❌	❌	❌ (API only

Demo

Here’s our Open-dLLM generating a QuickSort algorithm from scratch:

Youtube Video (Please play it, I rly want u to enjoy the music : )

                          Youtube Video (Please play it, I rly want u to enjoy the music : )

Table of Contents

Open-dLLM

Demo