28 Releases. 31 Days.
December was... busy. š¢
Roo Code: December 2025
Most of the industry slows down in December. We accidentally (on purpose, because we are your mildly unhinged group of agent makers) did the opposite.
Between December 1st (v3.35.0) and December 31st (v3.38.2), we shipped 28 releases. Why? Because we found a way to make your agents significantly smarter without waiting for the next model release (yes, including GPT-5.2-Codex).
We went after the unglamorous, critical plumbing: moving tool use onto native formats, tightening context architecture, and making debugging actionable. The goal was simple: remove friction so closing the loop feels straightforward.
We kept seeing the same opportunity: if we fix the plumbing, we unlock a new level of performance for every model you use.
Here is how we upgraded your engine.
1) Agent Skills: Superpowers on Demand
(Landed in v3.38.0; aligned to spec in v3.38.2)
The problem: if you stuff every tool and rule into the system prompt ājust in case,ā you burn tokens and distract the model.
The fix: we shipped Agent Skills. Think of these as modular superpowers. Instead of carrying the kitchen sink at all times, the agent can load specific capabilities (tools, prompts, resources) only when the task actually requires them.
So you can: build agents with deep expertise in specific domains without crushing their context window.
Tradeoff: skills add setup and coordination. The payoff is a cleaner context window and more predictable behavior.
2) Native Tool Calling: A Massive Performance Unlock
(Shipped continuously across December releases)
The problem: you think the model is āhallucinating.ā A lot of the time it is just overstimulated. Asking an LLM to execute code by parsing brittle text formats is like trying to solve a LeetCode Hard during a fire drill (this is fine). It burns cognitive energy on the formatting, so it has less brainpower left to actually solve the bug.
The fix: we pushed tool use onto native tool calling across providers (Roo Code Router, Anthropic, OpenAI, OpenRouter, etc.). We deprecated XML tool protocol selection for new tasks. We stopped forcing models to act like 1990s text parsers.
The upside: the difference is not subtle. In one internal eval, Claude 4.5 Haiku went from ~27% to ~95% success after the shift. If we publish this number, we should add one sentence defining the eval and sample size.
So you can: get top-tier results from the āfast and cheapā models. And you still keep full control with approvals. Roo Code does not execute without you saying so.
3) Fixing āAgent Amnesiaā (Context Management)
The problem: long tasks tend to rot. As the context window fills up, the agent gets confused. Suddenly you are re-explaining a file Roo Code read fifteen minutes ago. Itās exhausting.
The fix: we rewrote the MessageManager layer. History coordination and truncation are infrastructure now, not a coincidence.
So you can: spend tokens intentionally on results, not on reminding the agent what day it is.
4) Debugging: Showing the Receipts
The problem: the shift to native tool calling was a massive upgrade. It also exposed weird edge cases. APIs failed in creative ways we had not seen before, and āSomething went wrongā does not cut it when you are debugging protocol mismatches.
The fix: we took the medicine. We stopped hiding the underlying error details and shipped an Error Details Modal and Downloadable Diagnostics. We also improved reliability for tool execution edge cases.
So you can: see why it crashed instead of guessing. Huge shoutout to the community for the patience and the detailed bug reports while we stabilized the plumbing.
Ready for 2026?
December was not just a shipping spree. It was an engine swap. We stripped friction, fixed plumbing, and handed brainpower back to the agent.
If you havenāt updated since November, you are running on an older, slower engine.

