DeepReinforce releases Ornith, open-source coding model for autonomous agents

By Khal · Jun 29, 2026 (1 hour ago) · 2 min read

Editorial illustration for: DeepReinforce releases Ornith, an open-source coding model built for autonomous agents — Published Jun 29, 2026, 9:51 p.m. (1 hour ago)

In brief

DeepReinforce released Ornith-1.0 in four sizes (9B–397B MoE parameters), all under MIT license
Ornith is built for agentic coding—autonomous task completion rather than human-guided autocomplete
Flagship model achieves 82.4 on SWE-bench Verified, beating Claude Opus (80.8) and DeepSeek-V4-Pro (80.6)
Ornith treats agent scaffolding as learnable, enabling models to develop custom approaches

Four Sizes, MIT License, No Restrictions

Ornith-1.0 is available in four sizes: 9 billion, 31 billion, 35 billion mixture of experts, and 397 billion mixture-of-experts parameters. All are available under MIT license with no regional restrictions, meaning developers can download, modify, and deploy them freely. The smaller 9-billion-parameter model can run on a smartphone but lacks heavy reasoning capability. The 397 billion parameter variant is much more capable yet requires serious computing infrastructure beyond consumer hardware.

How Agentic Coding Works

Most large language models are still designed with human feedback in mind. They autocomplete—you write a line, they suggest the next one. Agentic AI works differently. In a coding context, that means an AI that reads files, runs tests, identifies what failed, fixes the code, and loops again until it's done.

The catch is structure. Most AI coding agents are paired with a human-designed harness—a fixed set of rules for how the agent structures its work. Ornith instead treats the scaffold as a learnable object that co-evolves with the policy, allowing it to develop its own approach. During reinforcement learning, each training step happens in two stages. The model first reads the task and proposes a refined strategy, then uses that strategy to generate a solution. The reward from the outcome flows back to both stages, optimizing the model for writing better strategies and better code.

Performance and Safety

The flagship 397 billion parameter model posts 82.4 on SWE-bench Verified. That beats Claude Opus 4.7's 80.8 and DeepSeek-V4-Pro's 80.6 on the same test. On Terminal Bench 2.1—a harder benchmark with 89 tasks run inside containerized terminal environments—Ornith's flagship model achieved 77.5 against Claude Opus 4.7's 70.3.

DeepReinforce implements three layers of defense against reward hacking: the environment and test suite are immutable and outside the model's reach, a deterministic monitor tracks behavior, and a frozen judge model evaluates outcomes. This prevents the agent from gaming the reward signal or exploiting loopholes in the training process.

The open-source release puts a capable agentic coding model in the hands of developers who want autonomy, not autocomplete.

DeepReinforce releases Ornith, open-source coding model for autonomous agents

In brief

Four Sizes, MIT License, No Restrictions

How Agentic Coding Works

Performance and Safety

Related stories

Maersk raises 2025 profit guidance on European demand surge

China launches Tulong Feng AI as Anthropic restricts Mythos

Senate leaders push CLARITY Act passage before July 13 recess