DeepReinforce releases Ornith, open-source coding model for autonomous agents
In brief
- DeepReinforce released Ornith-1.0 in four sizes (9B–397B MoE parameters), all under MIT license
- Ornith is built for agentic coding—autonomous task completion rather than human-guided autocomplete
- Flagship model achieves 82.4 on SWE-bench Verified, beating Claude Opus (80.8) and DeepSeek-V4-Pro (80.6)
- Ornith treats agent scaffolding as learnable, enabling models to develop custom approaches
Four Sizes, MIT License, No Restrictions
Ornith-1.0 is available in four sizes: 9 billion, 31 billion, 35 billion mixture of experts, and 397 billion mixture-of-experts parameters. All are available under MIT license with no regional restrictions, meaning developers can download, modify, and deploy them freely. The smaller 9-billion-parameter model can run on a smartphone but lacks heavy reasoning capability. The 397 billion parameter variant is much more capable yet requires serious computing infrastructure beyond consumer hardware.
How Agentic Coding Works
Most large language models are still designed with human feedback in mind. They autocomplete—you write a line, they suggest the next one. Agentic AI works differently. In a coding context, that means an AI that reads files, runs tests, identifies what failed, fixes the code, and loops again until it's done.
The catch is structure. Most AI coding agents are paired with a human-designed harness—a fixed set of rules for how the agent structures its work. Ornith instead treats the scaffold as a learnable object that co-evolves with the policy, allowing it to develop its own approach. During reinforcement learning, each training step happens in two stages. The model first reads the task and proposes a refined strategy, then uses that strategy to generate a solution. The reward from the outcome flows back to both stages, optimizing the model for writing better strategies and better code.
Performance and Safety
The flagship 397 billion parameter model posts 82.4 on SWE-bench Verified. That beats Claude Opus 4.7's 80.8 and DeepSeek-V4-Pro's 80.6 on the same test. On Terminal Bench 2.1—a harder benchmark with 89 tasks run inside containerized terminal environments—Ornith's flagship model achieved 77.5 against Claude Opus 4.7's 70.3.
DeepReinforce implements three layers of defense against reward hacking: the environment and test suite are immutable and outside the model's reach, a deterministic monitor tracks behavior, and a frozen judge model evaluates outcomes. This prevents the agent from gaming the reward signal or exploiting loopholes in the training process.
The open-source release puts a capable agentic coding model in the hands of developers who want autonomy, not autocomplete.


