Back to Models
FINAL-Bench logo

FINAL-Bench/Darwin-4B-Genesis

FINAL-Bench โ€ข general

Darwin-4B-Genesis

Gen1 Gen2 Gen3

9B 9B Space 31B 31B Space

35B 35B Space Q8 GGUF bartowski GGUF

FINAL Bench ALL Bench

World's first Transformer ร— Mamba evolutionary cross-architecture FFN breeding | CLIcK 92% | MuSR 70% | A 4B model outperforming 27B | CMA-ES 42-dimensional genome search | Hybrid Vigor demonstrated | Apache 2.0


What Is This?

Darwin-4B-Genesis is the 3rd generation Darwin model and the world's first model to successfully crossbreed FFN layers across different architectures โ€” Transformer (Gemma4) and Mamba (Qwen3.5 GatedDeltaNet) โ€” using evolutionary optimization.

The father's Attention layers (Gemma4 Transformer) are preserved at 100%, while the mother's FFN knowledge (Qwen3.5 Mamba) is transplanted at layer-specific optimal ratios discovered automatically by CMA-ES across 42 dimensions.

The result: the child outperforms both parents on every benchmark โ€” a phenomenon known as Hybrid Vigor.


Darwin-4B-Genesis

Why This Matters

1. World First

Existing hybrid models (Jamba, Nemotron-H, Granite 4.0) are all designed and trained from scratch. Darwin-4B-Genesis takes two already-trained models from different architecture families and breeds them evolutionarily โ€” with zero additional training.

2. Hybrid Vigor Demonstrated

BenchmarkDavid (Father)Qwen3.5-4B (Mother)Genesis (Child)
CLIcK90%~50% (est.)92% โœ…
MuSR65%~55% (est.)70% โœ…

The child surpasses both parents. This is the first demonstration of Hybrid Vigor in AI model breeding.

3. Manual vs Evolution

MethodCLIcKMuSR
Manual 50% blend~23%โ€”
Manual 30% selective blend62%45%
CMA-ES 42D automatic search92%70%

Human-chosen ratios fail. Evolutionary search succeeds.


Benchmarks

BenchmarkGenesisDavid (Gen2)K-AI #1 (27B)
CLIcK (Korean culture)92%90%0.794
MuSR (multi-step reasoning)70%65%0.604
GPQA (deep reasoning)~60%~60%โ€”

A 4B model dominates the K-AI leaderboard's #1 model (27B) on both CLIcK and MuSR.


How It Works

Cross-Architecture FFN Breeding

Father: Darwin-4B-David (Gemma4 Transformer, hidden=2560, 42 layers)
Mother: Qwen/Qwen3.5-4B (GatedDeltaNet/Mamba, hidden=2560, 32 layers)

Key insight: hidden_size matches (2560) โ†’ direct FFN replacement possible
Method: Attention 100% from Father, FFN blended at per-layer optimal ratios
Optimizer: CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
Genome: 42 dimensions (one ratio per layer)
Fitness: CLIcK 60% + MuSR 40% composite score
Frozen layers: L15, L16, L22, L23, L24, L25 (Korean language preservation)

Optimal Genome Discovered by CMA-ES

L00: 0.206  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  21% Qwen
L07: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Auto-protected by CMA-ES
L15: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Frozen (Korean)
L22: 0.000  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  Frozen (Korean)
L29: 0.291  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  29% Qwen (maximum)
L31: 0.244  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  24% Qwen
L32: 0.273  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘  27% Qwen

Key finding: CMA-ES applied the most aggressive Qwen blending to the final layers (L29-32), which govern output quality. The algorithm determined that "Qwen's generation quality exceeds Darwin's" for those specific layers โ€” while simultaneously protecting critical layers (L7, L18, L28) by driving their ratios to zero.

Training Cost

This ModelTypical Hybrid
GPUH100 ร— 1Hundreds to thousands
Time155 minutesWeeks to months
Training data0 tokensTrillions of tokens
Training computeFitness evaluation onlyFull pre-training

Genealogy

google/gemma-4-E4B-it ร— TeichAI/Claude-Opus-Distill-E4B
    โ†’ Darwin-4B-Opus (Gen 1, DARE-TIES merge)

Darwin-4B-Opus ร— DavidAU/DECKARD-Expresso-Universe
    โ†’ Darwin-4B-David (Gen 2, MRI-guided merge, CLIcK 90%)

Darwin-4B-David ร— Qwen/Qwen3.5-4B
    โ†’ Darwin-4B-Genesis (Gen 3, Cross-Arch FFN Breeding, CLIcK 92%) โ˜…

DNA Composition

Gemma4 Transformer (skeleton, Attention)  ~50%
Claude Opus Distill (reasoning patterns)  ~20%
DECKARD Universe (Korean, creativity)     ~15%
Qwen3.5 GatedDeltaNet (Mamba FFN)         ~15%

What Is FFN Breeding?

AI models have two main components:

  • Attention = the brain (decides what to focus on, reasoning chains)
  • FFN = the muscles (stores knowledge, processes patterns)

Darwin-4B-Genesis keeps the brain from the father (Transformer) and blends in muscles from the mother (Mamba) at optimal ratios. As long as the FFN input/output dimensions match (hidden_size=2560), the swap works โ€” like a USB-C port that accepts any compatible charger.


Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-4B-Genesis",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-4B-Genesis",
    dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Explain how hybrid vigor works in genetics."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))

Hardware Requirements

SetupVRAMStatus
NVIDIA RTX 4090 (24GB)24 GBBF16 fits
NVIDIA RTX 3090 (24GB)24 GBBF16 fits
NVIDIA H100 (93GB)93 GBComfortable
Mac M3 Max (36GB)36 GBComfortable

Dense 4B model โ€” runs on a single consumer GPU.


Model Specifications

ArchitectureGemma4 Dense (Transformer Attention + Mamba FFN hybrid)
Effective Parameters4B (8B total with PLE)
Hidden Size2560
Intermediate Size10240
Layers42
Context Length32,768
LicenseApache 2.0

How This Differs from Prior Work

Existing HybridsDarwin-4B-Genesis
ExamplesJamba, Nemotron-H, Granite 4.0This model
MethodDesign โ†’ train from scratchBreed trained models โ†’ zero training
CostThousands of GPUยทhoursH100 ร— 1, 2.6 hours
DataTrillions of tokens0 tokens (fitness eval only)
Ratio selectionManual architecture designCMA-ES 42D automatic search
Hybrid VigorNot testedBenchmarked and confirmed

Future Work

  • Cross-breeding with RWKV-7, xLSTM, and other architectures
  • Scaling to 31B/35B models with the same technique
  • Paper: "Cross-Architecture FFN Breeding with Evolutionary Optimization"
  • Patents: Methods for selective FFN transplantation across architectures

Acknowledgements

  • Korean Government โ€” GPU Support Program research grant
  • Google โ€” Gemma4 E4B architecture
  • Alibaba Qwen Team โ€” Qwen3.5-4B GatedDeltaNet
  • TeichAI โ€” Claude Opus Distill model
  • DavidAU โ€” DECKARD-Expresso-Universe model
  • Jackrong โ€” Claude 4.6 Opus Reasoning Distilled

Citation

@misc{vidraft_darwin_4b_genesis,
  title        = {Darwin-4B-Genesis: World's First Cross-Architecture FFN Breeding},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis}}
}
Visit Website
โ€”

0 reviews

5
0
4
0
3
0
2
0
1
0
Likes26
Downloadsโ€”
๐Ÿ“

No reviews yet

Be the first to review FINAL-Bench/Darwin-4B-Genesis!

Model Info

ProviderFINAL-Bench
Categorygeneral
Reviews0
Avg. Ratingโ€” / 5.0

Community

Likes26
Downloadsโ€”

Rating Guidelines

โ˜…โ˜…โ˜…โ˜…โ˜…Exceptional
โ˜…โ˜…โ˜…โ˜…Great
โ˜…โ˜…โ˜…Good
โ˜…โ˜…Fair
โ˜…Poor