North Mini Code: Open-Source Powerhouse for Agentic Pipelines

Cohere's North Mini Code introduces a competitive edge in agentic coding with its open-source, locally deployable model. It's a bold step towards cost-effective and transparent AI solutions.
Cohere's latest release, North Mini Code, offers a compelling open-source alternative to managed models like Claude Fable 5, boasting a design for agentic coding pipelines. Running on a single H100, it promises three times the output tokens, a factor essential for high-volume production workloads. The verbosity comes with a cost, but the trade-off might just be worth it for some.
The Model's Capabilities
North Mini Code isn't just another model. It's a 30 billion parameter mixture-of-experts (MoE) designed specifically for software engineering tasks. Think architecture mapping, code review, and terminal work. With 256,000 tokens in its context window, it can manage substantial projects in one sweep. And it's available under an Apache 2.0 license on Hugging Face. Clone the repo. Run the test. Then form an opinion.
The SDK handles this in three lines now. Targeting the full agentic coding stack, North Mini Code supports integrated tool-use and interleaved thinking, enhancing performance in multi-step agentic workflows. It's built to manage terminal environments efficiently, something Cohere validated using Terminal-Bench v2 benchmarks.
Building the Beast
With 128 experts and 8 active per token, North Mini Code is sparse yet effective. Remarkably, its compute needs at inference are akin to a 3 billion parameter model. Cohere's Nick Frosst demonstrated it running on a Mac Studio with around 20 GB of RAM, practical for local setups.
Training involved two stages of supervised fine-tuning, followed by reinforcement learning across over 70,000 tasks from 5,000 repositories. By using a multi-harness approach, Cohere achieved a 10% gain on OpenCode evaluation while maintaining SWE-Agent performance. This emphasizes the model's adaptability across different agent scaffolds.
Market Impact and Considerations
North Mini Code enters a crowded field with competitors like Mistral Devstral Small 2 and GitHub Copilot. But here's the kicker: Cohere claims North Mini Code delivers 2.8x higher throughput and 30% less inter-token latency than Devstral Small 2 on the same hardware.
Yet, there's a caveat. It generates 75 million tokens to complete tasks against a 25 million token median, which can inflate costs and latency. Is verbosity the hidden cost enterprises can afford?
Frosst frames it as a move towards cost-effective, transparent AI. With pricing models like Fable 5 at $50 per million output tokens, teams need to weigh managed infrastructure against local deployment's cost savings and control.
For enterprises, North Mini Code's release brings clarity to pipeline decisions. Models fine-tuned for agentic workflows are now a baseline. And throughput testing against real workloads becomes essential. Ship it to testnet first. Always.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
Anthropic's family of AI assistants, including Claude Haiku, Sonnet, and Opus.
The processing power needed to train and run AI models.
The maximum amount of text a language model can process at once, measured in tokens.
The process of measuring how well an AI model performs on its intended task.