One memory layer. Every AI. Never start from zero. Built-in memory is a notepad. ALMA is a learning system that makes your agent smarter every run.
# Before: AI makes same mistakes every session
agent.run("Deploy auth service") # Rolling update
agent.run("Deploy auth service") # Rolling update AGAIN
# After: ALMA remembers what works
from alma import ALMA
alma = ALMA.from_config(".alma/config.yaml")
# Get memories before task
memories = alma.retrieve(
task="Deploy auth service",
agent="backend-dev"
)
# Agent now knows: "Blue-green works (8/10)"
# Agent now knows: "Avoid rolling updates"
result = agent.run_with_context(memories)
Yes. Claude Code, ChatGPT, OpenClaw, and Gemini all have built-in memory now. So why would you need ALMA?
Because their memory is a notepad. ALMA is a learning system.
| Built-in Memory (Claude, ChatGPT, OpenClaw) | ALMA | |
|---|---|---|
| What it stores | Facts and preferences -- "user likes dark mode" | Outcomes -- what strategies worked, failed, and why |
| Does it learn? | No. It remembers what you told it. | Yes. After 3+ similar outcomes, auto-creates reusable strategies. |
| Does it warn you? | No. | Yes. Anti-patterns track what NOT to do, with why + alternatives. |
| Cross-platform? | No. Claude doesn't know what ChatGPT learned. | Yes. One memory layer shared across every AI tool. |
| Multi-agent? | No. Each session is isolated. | Yes. Junior agents inherit from senior agents. |
| Scoring | Basic relevance or "most recent" | 4-factor: similarity + recency + success rate + confidence |
| Lifecycle | Grows until you delete things | Automatic: decay, compression, consolidation, archival |
| Your data | Stored on their servers | Your database. SQLite, PostgreSQL, Qdrant -- you choose. |
| Benchmark | Not benchmarked | R@5 = 0.964 on LongMemEval (500 questions) |
| Trust / Verification | No. Everything returned as-is. | Veritas (built-in). Trust scoring, verified retrieval, conflict detection. |
ALMA doesn't replace Claude Code's memory or ChatGPT's memory -- it sits underneath as a deeper layer. Use built-in memory for quick preferences. Use ALMA for:
Benchmarked against LongMemEval (ICLR 2025) -- the standard benchmark for AI agent memory. 500 questions, ~53 conversation sessions each.
| System | LongMemEval | API Keys | Memory Types | Feedback Loop | Trust / Verification |
|---|---|---|---|---|---|
| ALMA v1.0 | R@5 = 0.964 | None | 5 | Yes (v1.0) | Veritas (built-in) |
| Mem0 | ~49% acc.* | GPT-4o | 2 | No | No |
| Zep | 71.2% acc.* | GPT-4o | 1 | No | No |
| Letta | Not published | GPT-4o | 2 | No | No |
| Beads | Not published | None | N/A (tasks) | No | No |
| RuVector | Not published | None | N/A (vectors) | Self-learning | No |
* Mem0 and Zep report accuracy (LLM-judged correctness), not recall. ALMA reports Recall@5 (retrieval-only, no LLM judge). Metrics are not directly comparable -- recall is a stricter, more reproducible measure.
pip install alma-memory[local] sentence-transformers
curl -fsSL -o /tmp/longmemeval.json \
https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
python -m benchmarks.longmemeval.runner --data /tmp/longmemeval.json
The Feedback Learning Benchmark (FLB) proves that ALMA's retrieval improves with usage. As agents provide feedback on which memories helped, future retrieval becomes more precise -- without any model retraining.
Track which retrieved memories the agent actually referenced in its output
Explicit positive/negative signal on memory quality -- "this helped" or "this was wrong"
Retrieval scores shift automatically based on accumulated feedback. No manual tuning needed.
ALMA v1.0 absorbs concepts from two MIT-licensed projects:
Both Beads and RuVector are MIT-licensed. ALMA integrates their concepts natively rather than wrapping them as dependencies.
Three phases. No model modifications. Your agent gets smarter every run.
FAISS vector search finds relevant memories. Multi-factor scoring ranks by similarity + recency + success rate + confidence.
After the task, record what happened -- success or failure, strategy used, how long it took. Every run becomes training data.
After 3+ similar outcomes, ALMA auto-creates reusable heuristics. After 2+ failures, it creates anti-patterns. Zero manual work.
Not another vector database. A learning system built for AI agents.
Other systems dump everything into one vector index. ALMA classifies memories into 5 types, each with different retrieval behavior and lifecycle.
Junior agents inherit from senior agents. Teams share knowledge across roles. One agent's lesson becomes the whole team's advantage.
agents:
senior_dev:
share_with: [junior_dev, qa_agent]
junior_dev:
inherit_from: [senior_dev]
qa_agent:
inherit_from: [senior_dev]
The 4-layer MemoryStack loads only what you need. Identity (~100 tokens) + Essential Story (~800 tokens) at wake-up. On-demand and deep search activate when needed. 95% of your context window stays free.
Most memory systems retrieve and forget. ALMA v1.0 tracks which memories your agent actually uses vs ignores, then adjusts future retrieval scores automatically. Retrieval gets better every run -- no retraining, no manual tuning.
# Track what the agent actually used
alma.record_usage(
memory_ids=["mem_1", "mem_3"],
task="deploy auth service"
)
# Explicit feedback: this memory helped
alma.record_feedback(
memory_id="mem_1",
signal="positive",
reason="correct deployment strategy"
)
# Next retrieval auto-boosts mem_1,
# auto-demotes unused memories
ALMA is a library, not a service. 7 storage backends from SQLite ($0) to Azure Cosmos (enterprise). Your database, your rules.
Native MCP server for Claude Code. Retrieve, learn, manage memories -- all through tool calls. One JSON config and you're connected.
Save state mid-workflow, resume after failures. Perfect for complex multi-step tasks that span sessions.
Already have conversations, project files, or chat exports? ALMA doesn't just dump them into a vector database like RAG. It reads, classifies, and structures them into the 5 memory types.
RAG retrieves text chunks by similarity. ALMA retrieves classified, scored, typed memories that improve over time.
from alma.ingestion import ingest_directory
from alma.ingestion import ingest_conversations
# Ingest project files
result = ingest_directory(
"/path/to/project",
agent="dev",
project_id="myapp"
)
# result.domain_knowledge: 47 facts
# result.user_preferences: 12 preferences
# result.anti_patterns: 3 problems
# result.outcomes: 8 milestones
# Ingest chat exports (6 formats)
result = ingest_conversations(
"/path/to/chats",
agent="dev",
project_id="myapp"
)
Memory without trust is dangerous. Your agent retrieves a "fact" -- but is it still accurate? Has it been contradicted? Who stored it, and do you trust them? Veritas answers all three questions before your agent acts.
Per-agent trust profiles scored 0.0 to 1.0. Five behavioral dimensions track how reliable each agent actually is -- not just what it claims.
Two-stage retrieval: fuzzy recall finds candidates, then verification confirms them. Every memory gets a status before your agent sees it.
Cross-verification catches contradicting memories before agents act on bad data. When Memory A says "use rolling updates" but Memory B says "rolling updates caused downtime" -- Veritas flags the conflict.
The VerifiedRetriever wraps ALMA's retrieval engine. Same query, but now every memory comes with a verification status and confidence score.
from alma.retrieval.verification import (
VerifiedRetriever,
VerificationConfig,
)
# Wrap your existing retrieval engine
retriever = VerifiedRetriever(
retrieval_engine=alma.retrieval_engine,
config=VerificationConfig(enabled=True)
)
# Two-stage retrieval: fuzzy recall + verification
results = retriever.retrieve_verified(
query="How to deploy auth service?",
agent="backend-dev",
project_id="my-project",
top_k=5
)
# Use only verified memories
for mem in results.verified:
print(f"[VERIFIED] {mem.memory}")
# Flag contradictions for review
for mem in results.contradicted:
print(f"[CONFLICT] {mem.verification.reason}")
print(f" Source: {mem.verification.contradicting_source}")
Every memory system assumes retrieved data is correct. But memories go stale. Agents store wrong conclusions. Different agents contradict each other. Without trust scoring and verification, your agent builds on a foundation it cannot validate.
Install ALMA and start giving your agents memory.
pip install alma-memory[local]
# Includes SQLite + FAISS + local embeddings
from alma import ALMA
alma = ALMA.from_config(".alma/config.yaml")
# Retrieve what the agent learned
memories = alma.retrieve(
task="Fix the login bug",
agent="developer",
top_k=5
)
# Inject into your prompt
prompt = f"""## Context from past runs
{memories.to_prompt()}
## Task
Fix the login bug"""
# After the task, learn from the outcome
alma.learn(
agent="developer",
task="Fix login bug",
outcome="success",
strategy_used="Cleared session cache"
)
# That's it. Every run gets smarter.
pip install alma-memory[postgres] # PostgreSQL + pgvector
pip install alma-memory[qdrant] # Qdrant
pip install alma-memory[pinecone] # Pinecone
pip install alma-memory[chroma] # ChromaDB
pip install alma-memory[azure] # Azure Cosmos DB
pip install alma-memory[all] # Everything
Three paths. Pick what fits. ALMA auto-creates tables on first run.
No database to install. Files stored locally. Perfect for development and personal use.
Full SQL with vector search. Complete schema in the README -- copy-paste and go.
Free PostgreSQL + pgvector. Create account, run SQL from README, configure YAML. Done.
ALMA's trust scoring and verified retrieval work great on a single instance. But when you run dozens of agents across multiple deployments, you need a shared source of trust.
If you answer yes to any of these, Veritas Cloud is being built for you.
"Our agents sometimes contradict each other and we only find out when a customer complains."
Veritas Cloud catches conflicts in real-time, before agents act on contradicting data.
"We can't prove to our clients that our AI agents are making trustworthy decisions."
Provenance chain API gives you a full audit trail. Show clients exactly how every decision was made.
"We run 50+ agent workflows and have no idea how many conflicts happen per week."
Trust dashboard shows trust violations, resolution rates, and memory accuracy across your entire fleet.
"Each agent deployment is isolated. Trust built in one workflow doesn't carry over to another."
Shared trust graph — trust scores, provenance data, and conflict history unified across all deployments.
"Our enterprise clients are starting to ask about AI compliance and audit trails."
SOC2-ready audit exports, per-tenant trust isolation, and SLA guarantees. Built for enterprise compliance.
We're building Veritas Cloud with design partners. If multi-agent trust is a pain point for your team, we want to hear from you.
Request Early AccessNo commitment. Tell us about your agent setup and we'll figure out if Veritas Cloud can help.
Every conversation makes the next one better. One pip install. Five minutes. $0.00 to start. No API keys.