Agents are getting pretty good at navigating the web, editing code, and handling complex workflows. But there’s a glaring problem: once you deploy them, they stop learning. They’ll make the same dumb mistake a hundred times because they have no way to look back and say, “Oh, that didn’t work last time.”
Google Research just dropped ReasoningBank, a memory framework that fixes this. The paper got accepted at ICLR, and the code is on GitHub. The core idea is refreshingly simple: instead of saving every action an agent takes (like many existing approaches do), it distills high-level reasoning patterns from both wins and failures. Then it uses those patterns to guide future decisions.
The Problem with Most Agent Memory Systems
Most memory systems today fall into two camps. One camp saves exhaustive action logs — every click, every API call, every keystroke. That’s what Synapse does. The other camp, like Agent Workflow Memory, only documents successful workflows. Both miss the point.
Logging every action gives you a firehose of data but no strategic insight. You end up with a list of what happened, not why it happened or what you should do differently next time. And ignoring failures? That’s like only studying your wins in a sport and never watching film of your losses. You never learn what not to do.
ReasoningBank takes a different approach. It stores structured memory items that contain a title, a description, and the actual reasoning steps or decision rationales. Think of it as a playbook, not a transcript.
How It Actually Works
The workflow runs in a continuous loop. Before the agent acts, it pulls relevant memories from the bank into its context. After taking action, it uses an LLM-as-a-judge to self-assess the trajectory. It extracts success insights or failure reflections. Then it distills those into new memories and adds them back to the bank.
Here’s the kicker: the self-judgment doesn’t need to be perfect. The paper found ReasoningBank is surprisingly robust against noise in the evaluation. So you don’t need a perfect critic — just a decent one that catches the obvious stuff.
And crucially, it learns from failures. Instead of just memorizing “click the ‘Load More’ button” from a successful run, it might learn “always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results.” That’s a strategic guardrail, not a procedural rule.
Does It Actually Work?
On web browsing and software engineering benchmarks, ReasoningBank outperformed baseline approaches on both effectiveness (higher success rates) and efficiency (fewer steps). That’s the sweet spot — better results with less wasted effort.
I’ve seen enough agent frameworks claim to solve the memory problem that I’m naturally skeptical. But the fact that they’re distilling reasoning patterns rather than storing raw trajectories is a genuine shift. Most work in this area feels like someone trying to build a library by just piling books on the floor. ReasoningBank at least tries to organize them into a usable system.
What I’d Like to See Next
The paper admits they’re using a simple append strategy for now — new memories just get tacked onto the bank. More sophisticated consolidation is left for future work. That’s fine for a research paper, but in practice, you’ll eventually need deduplication, pruning, and maybe even forgetting mechanisms. A memory bank that just grows forever isn’t sustainable.
Still, this is one of the more practical agent memory papers I’ve read in a while. It acknowledges that agents need to learn from failure, which is something most systems conveniently ignore. And it actually ships with code, so you can try it yourself instead of just reading about it.
The Bottom Line
ReasoningBank won’t solve all your agent problems. But if you’re building long-running agents that need to improve over time, this is worth a look. It’s a solid step toward agents that actually learn from experience instead of just repeating the same patterns forever.
Comments (0)
Login Log in to comment.
Be the first to comment!