39 160

Crypti

CryptoAIM

AI & ML interests

None yet

Recent Activity

liked a dataset 1 day ago

nvidia/Nemotron-Cascade-2-SFT-Data

liked a dataset 1 day ago

Modotte/CodeX-2M-Thinking

liked a model 2 days ago

armand0e/Qwen3.5-9B-Agent

View all activity

Organizations

None yet

liked 2 datasets 1 day ago

nvidia/Nemotron-Cascade-2-SFT-Data

Viewer • Updated Mar 19 • 15.9M • 11.5k • 65

Modotte/CodeX-2M-Thinking

Viewer • Updated Feb 10 • 2.19M • 5.84k • 93

liked 2 models 2 days ago

armand0e/Qwen3.5-9B-Agent

Image-Text-to-Text • 10B • Updated about 2 hours ago • 230 • 2

armand0e/Qwen3.5-9B-Agent-GGUF

9B • Updated about 2 hours ago • 2.39k • 1

reacted to SeaWolf-AI's post with 🔥 2 days ago

Post

5143

🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.

🔗 Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
🔗 arXiv: https://arxiv.org/abs/2605.14386
🔗 Model: FINAL-Bench/Darwin-28B-REASON
🔗 Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints — with zero
gradient-based training — it reaches frontier-level reasoning.

- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89%
- 💸 Zero gradient steps — not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B → 35B scale
- 🔀 Cross-architecture breeding between Transformer and Mamba families
- 🔁 Stable recursive multi-generation evolution

#Three Core Mechanisms

① 14-dim Adaptive Merge Genome — fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

② MRI-Trust Fusion — we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient — Darwin learns the balance from data.

③ Architecture Mapper — weight-space breeding across heterogeneous
families. Attention × SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them — no gradients required.

Replies and critiques welcome 🙌