coding models open source reinforcement learning benchmarks Nous Research

NousCoder-14B: Open-Source Coding Model Arrives Just as Everyone's Losing Their Minds Over Claude Code

17 May 2026

Nous Research has released NousCoder-14B, an open-source coding model trained in just four days on 48 Nvidia B200 GPUs, achieving 67.87% accuracy on the LiveCodeBench v6 benchmark — a 7-point improvement over its base model. The release stands out for its radical transparency, with Nous publishing not only the model weights but also the full training environment and reinforcement learning framework, enabling others to reproduce the work. However, the researchers flag a significant concern: the training dataset approached the limits of available competitive programming problems, pointing to data scarcity as a key obstacle for future AI coding progress and highlighting synthetic data generation as a critical area for future research.

Nous Research, the open-source AI outfit bankrolled by crypto VC firm Paradigm, has dropped a new coding model that it claims matches or beats several larger proprietary systems. It was trained in four days flat on 48 of Nvidia's B200 GPUs. Make of that what you will.

The model is called NousCoder-14B, and it lands at a moment when developers on social media have been falling over themselves to post Claude Code testimonials. Anthropic's agentic coding tool has dominated the discourse since New Year's Day, with engineers posting screenshots of it apparently rebuilding a year's worth of work from a three-paragraph prompt. So the timing is either very good or very calculated — probably both.

On LiveCodeBench v6, a standardised benchmark using competitive programming problems from August 2024 to May 2025, NousCoder-14B scores 67.87%. That's a 7.08 percentage point improvement over its base model, Alibaba's Qwen3-14B. Decent progress. Not world-ending.

What actually makes this release interesting is the openness. Nous Research didn't just publish model weights — they released the full reinforcement learning environment, benchmark suite, and training harness, all built on their Atropos framework. Anyone with enough compute can reproduce or extend the work. That's rarer than it should be.

The model was trained by Joe Li, a researcher in residence at Nous and a former competitive programmer. His technical report includes a charmingly personal observation: the improvement NousCoder-14B made during training, jumping from roughly a Codeforces rating of 1,600–1,750 up to 2,100–2,200 , this mirrors a leap that took Li himself nearly two years of practice as a teenager. The model did it in four days.

"Watching that final training run unfold was quite a surreal experience," Li wrote. One imagines it was.

He was careful to add the obvious caveat, though: he solved around 1,000 problems over those two years. The model needed 24,000. Humans remain dramatically more sample-efficient, for now.

The training pipeline itself is fairly standard reinforcement learning with a few wrinkles. The model generates code, the code runs against test cases, it either passes or it doesn't. Simple binary feedback. Nous used Modal for sandboxed parallel code execution across those 24,000 problems, each with hundreds of test cases, with time and memory limits of 15 seconds and 4GB respectively.

The optimisation algorithm of choice was DAPO (Dynamic Sampling Policy Optimization), with a dynamic sampling twist that discards training examples where the model always succeeds or always fails — since neither provides a useful learning signal. Context windows were extended in stages, from 32,000 to 40,000 tokens during training, with evaluation pushed to roughly 80,000 tokens. Inference and verification were pipelined to keep the GPUs busy.

Buried in the technical report is arguably the most significant finding: the 24,000 training problems represent a sizeable chunk of all available, properly formatted competitive programming problems that exist online. Li put it plainly, the field is approaching the limits of high-quality data in this domain.

This is not a Nous Research problem. It's an industry problem. Compute keeps scaling. Clean, verifiable training data does not. Li's proposed solution: train models to generate their own problems, enabling a kind of self-play. "Once synthetic problem generation is solved, self-play becomes a very interesting direction," he wrote. It's an elegant idea. It's also unsolved.

Nous Research itself is a mildly eccentric outfit — anime-style branding, crypto-adjacent funding, a community that inspires either loyalty or eye-rolls depending on your disposition. "Stop benchmarkmaxxing," complained one observer on X, somewhat fairly. Others pointed out that Nvidia's Nemotron models score better on the same benchmark. Legitimate questions were also raised about whether NousCoder-14B is genuinely useful for iterative, multi-turn software development or just polished at single-shot problem-solving. The difference matters enormously in practice.

Future work flagged in the report includes multi-turn reinforcement learning teaching the model to use intermediate feedback like compilation errors rather than just final pass/fail signals and addressing the persistent problem of models producing longer responses when they're wrong and running out of context window during training.

The model is available on Hugging Face under an Apache 2.0 licence. The full Atropos training stack is published alongside it.

The honest summary: a well-documented, genuinely open-source coding model with solid benchmark numbers, trained fast on expensive hardware, by a small team that at least has the decency to publish their methodology. In a field drowning in vague capability claims and closed systems, that's worth something, even if it's not quite going to dethrone Claude Code any time soon.

READ NEXT

Capital One Releases AI Vulnerability Hunter to the Public Linus Torvalds on AI Coding Critics: Fork Off North Korean Hackers Are Quietly Poisoning Open Source Repositories