DeepReinforce Releases Ornith-1.0 Open-Source Coding Models
DeepReinforce has launched Ornith-1.0, an open-source coding model family that dynamically co-evolves its own agentic scaffolds during reinforcement learning.
- Ornith-1.0 is an open-source coding model family released under the MIT license, built on top of pretrained Gemma 4 and Qwen 3.5 checkpoints.
- Instead of using fixed, human-designed harnesses, the models learn to propose, refine, and optimize their own orchestration scaffolds during reinforcement learning.
- The model lineup spans four sizes: 9B Dense, 31B Dense, 35B Mixture-of-Experts (MoE), and a flagship 397B MoE.
- The 397B MoE variant outperforms Claude Opus 4.7 on headline coding benchmarks, but trails Claude Opus 4.8 and the larger GLM-5.2-744B.
DeepReinforce has released Ornith-1.0, an open-source family of reasoning models designed specifically for agentic coding. Rather than relying on rigid, human-designed scaffolds to manage memory and tools, these models learn to write and optimize their own orchestration harnesses during reinforcement learning. The release aims to deliver highly adaptable, task-specific strategies without requiring developers to hand-engineer complex agent wrappers.
Why it matters
According to a report by MarkTechPost, the Ornith-1.0 family is released under the permissive MIT license on Hugging Face. Built on top of pretrained Gemma 4 and Qwen 3.5 checkpoints, the lineup includes four sizes: a 9B dense model, a 31B dense model, a 35B mixture-of-experts (MoE) model activating roughly 3B parameters per token, and a massive 397B MoE flagship. The models output reasoning traces in a dedicated block before delivering answers, and they expose OpenAI-compatible endpoints that integrate with standard developer frameworks. The 19GB 9B model can be served locally on a single 80GB GPU using vLLM, SGLang, or Transformers.
The core innovation is the self-scaffolding reinforcement learning mechanism. In a two-stage process, the model analyzes a task, proposes a modified scaffold, and then executes the task using that custom scaffold. A token-level Group Relative Policy Optimization (GRPO) objective then rewards both the scaffold design and the final solution. To mitigate the risk of reward hacking—where the model might write a scaffold to cheat by hardcoding test outputs or reading oracle solutions—DeepReinforce implemented a three-tier defense consisting of a fixed outer trust boundary, a deterministic monitor, and a frozen LLM judge.
What it means for you
In terms of raw capability, DeepReinforce reports that the 397B-MoE variant beats Anthropic’s Claude Opus 4.7 on major programming benchmarks. However, it falls short of Claude Opus 4.8 and the 744B GLM-5.2. As developers look to integrate these models into their development pipelines, evaluating how they stack up against the best AI coding tools will be essential for determining whether local, self-scaffolding models can truly replace commercial APIs.
Frequently asked questions
What is self-scaffolding in Ornith-1.0?
Self-scaffolding is a training approach where the AI model learns to design its own orchestration harness—including tool use, memory, and error handling—during reinforcement learning, rather than relying on a fixed, human-engineered wrapper.
Can Ornith-1.0 be run locally?
Yes. The smaller 9B dense model requires about 19GB of memory in bf16 format and can be served on a single 80GB GPU. DeepReinforce has released FP8 and GGUF builds for local deployment via vLLM, SGLang, and Transformers.
How does Ornith-1.0 prevent reward hacking?
The model uses three defense layers to stop it from cheating: an immutable outer trust boundary, a deterministic monitor, and a frozen LLM judge.
See how these open-source models compare to the industry standards in our hands-on review of the best AI coding tools.
Best AI Coding Tools (2026): 7 Tested & Ranked →Source: MarkTechPost. Published June 25, 2026.
Ali has hands-on tested 50+ AI tools and tracks model releases daily. Every verdict here comes from real, paid usage — never vendor demos or sponsored placements.
AI Tools Worth is independent and unsponsored. Some linked guides contain affiliate links — they never change our verdicts.