Why I Built ALMA¶
The Problem¶
I was building AI agents for automated testing -- a QA tester agent for frontend validation, a backend developer agent for API verification. They worked great... until they didn't.
The same mistakes kept happening:
- The QA tester would use
sleep(5000)for waits, causing flaky tests - The backend developer would forget that the API uses JWT with 24-hour expiry
- Both agents would repeat failed strategies session after session
Every conversation started fresh. No memory. No learning. Just an expensive LLM making the same mistakes I'd already corrected.
The Search for Solutions¶
I tried Mem0. It stores memories, but:
- No way to scope what an agent can learn
- No anti-pattern tracking ("don't do this")
- No multi-agent knowledge sharing
- The QA tester could "learn" database queries it would never use
I looked at LangChain Memory. It's for conversation context, not long-term learning. Different problem.
Nothing fit.
I needed:
- Scoped learning - The QA tester learns testing, not backend logic
- Anti-patterns - Remember what NOT to do
- Multi-agent sharing - Senior agents share knowledge with juniors
- Workflow context - Resume complex tasks after failures
- MCP integration - Work natively with Claude Code
Building ALMA¶
So I built it. Agent Learning Memory Architecture.
The core insight: AI agents don't need to modify their weights to "learn." They need smart prompts built from relevant past experiences.
+---------------------------------------------------------------------+
| BEFORE TASK: Retrieve relevant memories |
| +-- "Last time you tested forms, incremental validation worked" |
| +-- "User prefers verbose output" |
| +-- "Don't use sleep() - causes flaky tests" |
+---------------------------------------------------------------------+
| DURING TASK: Agent executes with injected knowledge |
+---------------------------------------------------------------------+
| AFTER TASK: Learn from outcome |
| +-- Success? -> New heuristic. Failure? -> Anti-pattern. |
+---------------------------------------------------------------------+
No fine-tuning. No model changes. Just smarter prompts.
What Makes ALMA Different¶
Scoped Learning¶
The QA tester agent can only learn what it needs:
agents:
qa_tester:
can_learn:
- testing_strategies
- selector_patterns
cannot_learn:
- backend_logic
- database_queries
Anti-Pattern Tracking¶
When something fails, record WHY and WHAT TO DO INSTEAD:
alma.add_anti_pattern(
agent="qa_tester",
pattern="Using sleep() for async waits",
why_bad="Causes flaky tests, wastes time",
better_alternative="Use explicit waits with conditions"
)
Multi-Agent Sharing¶
Senior agents teach juniors:
agents:
senior_architect:
share_with: [junior_dev, qa_agent]
junior_dev:
inherit_from: [senior_architect]
Workflow Context (v0.6.0)¶
Complex tasks can checkpoint and resume:
# Save state mid-workflow
alma.checkpoint(workflow_id="deploy-v2", state=current_state)
# Resume after failure
alma.resume(workflow_id="deploy-v2")
The Result¶
The agents now:
- Remember what worked across sessions
- Avoid strategies that failed before
- Share knowledge with each other
- Pick up complex tasks where they left off
They're not smarter. They're better informed.
Open Source¶
ALMA is MIT licensed. Use it, modify it, contribute to it.
- GitHub: github.com/RBKunnela/ALMA-memory
- PyPI:
pip install alma-memory - npm:
@rbkunnela/alma-memory
If your AI agents keep making the same mistakes, they don't have a memory problem. They have an ALMA problem.