Workweave Launches Open-Source AI Model Router to Cut API Costs
Workweave’s new open-source router acts as a local proxy for Anthropic, OpenAI, and Gemini, promising 50ms routing and up to 70% cost savings.
- Workweave has open-sourced an AI model router designed to act as a local proxy for OpenAI, Anthropic, and Gemini models.
- The tool routes prompts in 50 milliseconds using an on-box embedder and a scoring system derived from Avengers-Pro, rather than prompt-based evaluation.
- According to the project’s GitHub repository, the router can reduce API costs by 40% to 70% by matching prompts to the most efficient model.
- Developers can integrate the tool directly with Claude Code, Codex, Cursor, or custom applications by changing their endpoint to localhost.
Workweave has released an open-source model router designed to slash API costs by dynamically directing prompts to the most efficient LLM. According to the project’s GitHub repository, the tool serves as a local, drop-in proxy for major providers like Anthropic, OpenAI, and Gemini. By routing each request in 50 milliseconds, the developer claims the tool can cut API expenses by 40% to 70% without sacrificing performance.
How the Routing Technology Works
Instead of relying on slow, expensive prompt-based routing, the Workweave router utilizes a lightweight, on-box embedder. It employs a cluster scoring mechanism derived from the Avengers-Pro framework to determine the optimal model for any given prompt on a per-request basis. The creators, who also build the Weave engineering intelligence platform, state that the tool currently ranks first on the RouterArena leaderboard (Acc-Cost Arena).
Integrations and Developer Setup
The proxy is designed to integrate seamlessly into existing developer workflows. Developers can direct tools like Claude Code, Codex, or Cursor to localhost to start routing requests automatically. For developers managing multiple LLMs, finding the right balance between cost and capability is a constant challenge. Utilizing specialized local proxies is rapidly becoming a standard practice among teams looking to optimize their AI coding tools and agentic workflows.
Why It Matters
This release highlights a growing shift toward local infrastructure in agentic AI systems. By shifting the routing logic to a 50ms local step, developers can avoid the latency and cost of cloud-based routing services while maintaining fine-grained control over which provider handles which task.
Frequently asked questions
How does the Workweave router decide which model to use?
The router uses a lightweight, local on-box embedder and a cluster scorer derived from Avengers-Pro to analyze and route each prompt in 50 milliseconds, avoiding slow, prompt-based evaluations.
Which AI models and providers does the router support?
According to the project documentation, it acts as a drop-in proxy for models from Anthropic, OpenAI, and Gemini.
How do you integrate the Workweave router with development tools?
You can integrate it by running the router locally and pointing tools like Claude Code, Codex, Cursor, or your own application to the localhost endpoint (typically port 8080).
To see how these models perform in real-world development environments, read our evaluation of the best AI coding tools currently on the market.
Best AI Coding Tools (2026): 7 Tested & Ranked →Source: Hacker News. Published June 26, 2026.
Ali has hands-on tested 50+ AI tools and tracks model releases daily. Every verdict here comes from real, paid usage — never vendor demos or sponsored placements.
AI Tools Worth is independent and unsponsored. Some linked guides contain affiliate links — they never change our verdicts.