v0.10.0 - Feedback Learning
#1 on LongMemEval R@5 = 0.964
Veritas Trust Layer

Your AI Forgets
Everything. Fix It.

One memory layer. Every AI. Never start from zero. Built-in memory is a notepad. ALMA is a learning system that makes your agent smarter every run.

$0.00 to start
No API keys needed
7 Storage Backends
2,121 Tests Passing
agent_memory.py
# Before: AI makes same mistakes every session
agent.run("Deploy auth service")  # Rolling update
agent.run("Deploy auth service")  # Rolling update AGAIN

# After: ALMA remembers what works
from alma import ALMA
alma = ALMA.from_config(".alma/config.yaml")

# Get memories before task
memories = alma.retrieve(
    task="Deploy auth service",
    agent="backend-dev"
)

# Agent now knows: "Blue-green works (8/10)"
# Agent now knows: "Avoid rolling updates"
result = agent.run_with_context(memories)

"But My AI Already Has Memory..."

Yes. Claude Code, ChatGPT, OpenClaw, and Gemini all have built-in memory now. So why would you need ALMA?
Because their memory is a notepad. ALMA is a learning system.

Built-in Memory (Claude, ChatGPT, OpenClaw) ALMA
What it stores Facts and preferences -- "user likes dark mode" Outcomes -- what strategies worked, failed, and why
Does it learn? No. It remembers what you told it. Yes. After 3+ similar outcomes, auto-creates reusable strategies.
Does it warn you? No. Yes. Anti-patterns track what NOT to do, with why + alternatives.
Cross-platform? No. Claude doesn't know what ChatGPT learned. Yes. One memory layer shared across every AI tool.
Multi-agent? No. Each session is isolated. Yes. Junior agents inherit from senior agents.
Scoring Basic relevance or "most recent" 4-factor: similarity + recency + success rate + confidence
Lifecycle Grows until you delete things Automatic: decay, compression, consolidation, archival
Your data Stored on their servers Your database. SQLite, PostgreSQL, Qdrant -- you choose.
Benchmark Not benchmarked R@5 = 0.964 on LongMemEval (500 questions)
Trust / Verification No. Everything returned as-is. Veritas (built-in). Trust scoring, verified retrieval, conflict detection.

ALMA works WITH built-in memory, not against it

ALMA doesn't replace Claude Code's memory or ChatGPT's memory -- it sits underneath as a deeper layer. Use built-in memory for quick preferences. Use ALMA for:

Strategy tracking -- which approaches worked for which problems
Failure prevention -- anti-patterns that stop mistakes from repeating
Team knowledge -- sharing lessons across agents and platforms
Measurable retrieval -- benchmarked at R@5=0.964, not "trust me"

Proven: #1 on LongMemEval + Feedback Learning

Benchmarked against LongMemEval (ICLR 2025) -- the standard benchmark for AI agent memory. 500 questions, ~53 conversation sessions each.

ALMA Benchmark Results - #1 on LongMemEval
System LongMemEval API Keys Memory Types Feedback Loop Trust / Verification
ALMA v1.0 R@5 = 0.964 None 5 Yes (v1.0) Veritas (built-in)
Mem0 ~49% acc.* GPT-4o 2 No No
Zep 71.2% acc.* GPT-4o 1 No No
Letta Not published GPT-4o 2 No No
Beads Not published None N/A (tasks) No No
RuVector Not published None N/A (vectors) Self-learning No

* Mem0 and Zep report accuracy (LLM-judged correctness), not recall. ALMA reports Recall@5 (retrieval-only, no LLM judge). Metrics are not directly comparable -- recall is a stricter, more reproducible measure.

ALMA v1.0
0.964
Hindsight
0.914
Zep
0.638
Mem0
0.490
1.000
Knowledge Update
0.992
Multi-Session
0.967
Preferences
0.947
Temporal Reasoning
0.946
Assistant Memory
0.914
User Memory
Reproduce it yourself in 3 commands ~30 minutes, any CPU, no GPU
pip install alma-memory[local] sentence-transformers
curl -fsSL -o /tmp/longmemeval.json \
  https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
python -m benchmarks.longmemeval.runner --data /tmp/longmemeval.json
Full methodology: BENCHMARK-REPORT.md
New in v1.0

Retrieval That Gets Better Over Time

The Feedback Learning Benchmark (FLB) proves that ALMA's retrieval improves with usage. As agents provide feedback on which memories helped, future retrieval becomes more precise -- without any model retraining.

record_usage()

Track which retrieved memories the agent actually referenced in its output

record_feedback()

Explicit positive/negative signal on memory quality -- "this helped" or "this was wrong"

Auto-adjust

Retrieval scores shift automatically based on accumulated feedback. No manual tuning needed.

Standing on the shoulders of open source

ALMA v1.0 absorbs concepts from two MIT-licensed projects:

Beads Task dependency tracking and structured workflow memory -- absorbed into ALMA's workflow checkpoint system
RuVector Self-learning vector retrieval with feedback signals -- the inspiration behind ALMA's Retrieval Feedback Loop

Both Beads and RuVector are MIT-licensed. ALMA integrates their concepts natively rather than wrapping them as dependencies.

How ALMA Works

Three phases. No model modifications. Your agent gets smarter every run.

ALMA Retrieval Pipeline
Retrieve

Ask ALMA for context

FAISS vector search finds relevant memories. Multi-factor scoring ranks by similarity + recency + success rate + confidence.

alma.retrieve(task=..., agent=...)
# Returns ranked memories
Learn

Record outcomes

After the task, record what happened -- success or failure, strategy used, how long it took. Every run becomes training data.

alma.learn(outcome="success",
  strategy_used="blue-green")
Improve

Auto-create strategies

After 3+ similar outcomes, ALMA auto-creates reusable heuristics. After 2+ failures, it creates anti-patterns. Zero manual work.

Heuristic created
Anti-pattern tracked
ALMA Learning Loop

What Makes ALMA Different

Not another vector database. A learning system built for AI agents.

Five memory types, not just embeddings

Other systems dump everything into one vector index. ALMA classifies memories into 5 types, each with different retrieval behavior and lifecycle.

Heuristic Strategies that work -- "For forms with >5 fields, validate incrementally"
Outcome Task results -- "Login test passed using JWT -- 340ms"
AntiPattern What NOT to do -- "Don't use sleep() for async waits -- causes flaky tests"
DomainKnowledge Facts -- "Auth uses OAuth 2.0, tokens expire in 24h"
UserPreference Your constraints -- "Prefer verbose output, Python 3.12, dark theme"
Five Memory Types
Multi-Agent Memory Sharing

Multi-agent knowledge sharing

Junior agents inherit from senior agents. Teams share knowledge across roles. One agent's lesson becomes the whole team's advantage.

agents:
  senior_dev:
    share_with: [junior_dev, qa_agent]
  junior_dev:
    inherit_from: [senior_dev]
  qa_agent:
    inherit_from: [senior_dev]

Token-efficient context loading

The 4-layer MemoryStack loads only what you need. Identity (~100 tokens) + Essential Story (~800 tokens) at wake-up. On-demand and deep search activate when needed. 95% of your context window stays free.

~100
Identity tokens
~800
Story tokens
On-demand
Task context
95%
Window free
4-Layer MemoryStack
Retrieval Feedback Loop

Self-improving retrieval

v1.0

Most memory systems retrieve and forget. ALMA v1.0 tracks which memories your agent actually uses vs ignores, then adjusts future retrieval scores automatically. Retrieval gets better every run -- no retraining, no manual tuning.

Usage tracking -- which memories agents actually reference in their output
Feedback scoring -- positive/negative signals adjust retrieval weight over time
Zero config -- works automatically with any storage backend
# Track what the agent actually used
alma.record_usage(
    memory_ids=["mem_1", "mem_3"],
    task="deploy auth service"
)

# Explicit feedback: this memory helped
alma.record_feedback(
    memory_id="mem_1",
    signal="positive",
    reason="correct deployment strategy"
)

# Next retrieval auto-boosts mem_1,
# auto-demotes unused memories
Feedback Scoring Pipeline
The feedback scoring pipeline: memories that get used rise in rank. Memories that get ignored decay. Your retrieval improves automatically.

Your data, your infrastructure

ALMA is a library, not a service. 7 storage backends from SQLite ($0) to Azure Cosmos (enterprise). Your database, your rules.

SQLite + FAISS PostgreSQL Qdrant Pinecone Chroma Azure File

22 MCP Tools

Native MCP server for Claude Code. Retrieve, learn, manage memories -- all through tool calls. One JSON config and you're connected.

python -m alma.mcp
# 22 tools ready to use

Workflow Checkpoints

Save state mid-workflow, resume after failures. Perfect for complex multi-step tasks that span sessions.

alma.checkpoint(workflow_id, state)
alma.resume(workflow_id)
5
Memory Types
22
MCP Tools
4
Graph Backends
6
Domain Schemas

Bootstrap From Existing Knowledge

Already have conversations, project files, or chat exports? ALMA doesn't just dump them into a vector database like RAG. It reads, classifies, and structures them into the 5 memory types.

This is not RAG

RAG retrieves text chunks by similarity. ALMA retrieves classified, scored, typed memories that improve over time.

--> Decisions you made --> DomainKnowledge (retrievable facts)
--> Preferences you stated --> UserPreference (constraints agents respect)
--> Things that worked --> Outcomes (success records with strategies)
--> Problems you hit --> AntiPatterns (mistakes agents won't repeat)
from alma.ingestion import ingest_directory
from alma.ingestion import ingest_conversations

# Ingest project files
result = ingest_directory(
    "/path/to/project",
    agent="dev",
    project_id="myapp"
)
# result.domain_knowledge: 47 facts
# result.user_preferences: 12 preferences
# result.anti_patterns: 3 problems
# result.outcomes: 8 milestones

# Ingest chat exports (6 formats)
result = ingest_conversations(
    "/path/to/chats",
    agent="dev",
    project_id="myapp"
)
Claude Code JSONL ChatGPT JSON Claude.ai JSON Codex JSONL Slack JSON Plain Text
Built into ALMA

Veritas Trust Layer -- Trust Your Agent's Memories

Memory without trust is dangerous. Your agent retrieves a "fact" -- but is it still accurate? Has it been contradicted? Who stored it, and do you trust them? Veritas answers all three questions before your agent acts.

Trust Scoring

Per-agent trust profiles scored 0.0 to 1.0. Five behavioral dimensions track how reliable each agent actually is -- not just what it claims.

verification-before-claim
loud-failure
honest-uncertainty
paper-trail
diligent-execution
30-day half-life -- trust decays without activity

Verified Retrieval

Two-stage retrieval: fuzzy recall finds candidates, then verification confirms them. Every memory gets a status before your agent sees it.

VERIFIED Safe to use
UNCERTAIN Use with caution
CONTRADICTED Conflict detected
UNVERIFIABLE No method available

Conflict Detection

Cross-verification catches contradicting memories before agents act on bad data. When Memory A says "use rolling updates" but Memory B says "rolling updates caused downtime" -- Veritas flags the conflict.

Contradiction found
mem_42 says "use rolling updates"
mem_87 says "rolling updates failed"
Agent sees both sides
Decides based on evidence, not stale data

Verified retrieval in 5 lines

The VerifiedRetriever wraps ALMA's retrieval engine. Same query, but now every memory comes with a verification status and confidence score.

Stage 1 -- Fuzzy Recall: Semantic search finds candidates with expanded set
Stage 2 -- Verification: Ground truth, cross-verify, or confidence fallback
Result: Categorized by status -- verified, uncertain, contradicted
verified_retrieval.py
from alma.retrieval.verification import (
    VerifiedRetriever,
    VerificationConfig,
)

# Wrap your existing retrieval engine
retriever = VerifiedRetriever(
    retrieval_engine=alma.retrieval_engine,
    config=VerificationConfig(enabled=True)
)

# Two-stage retrieval: fuzzy recall + verification
results = retriever.retrieve_verified(
    query="How to deploy auth service?",
    agent="backend-dev",
    project_id="my-project",
    top_k=5
)

# Use only verified memories
for mem in results.verified:
    print(f"[VERIFIED] {mem.memory}")

# Flag contradictions for review
for mem in results.contradicted:
    print(f"[CONFLICT] {mem.verification.reason}")
    print(f"  Source: {mem.verification.contradicting_source}")

Why trust matters for AI memory

Every memory system assumes retrieved data is correct. But memories go stale. Agents store wrong conclusions. Different agents contradict each other. Without trust scoring and verification, your agent builds on a foundation it cannot validate.

No extra config -- Veritas is built into ALMA's retrieval engine. Works with all 7 storage backends.
LLM optional -- Confidence-based verification works without any LLM. Add one for ground truth and cross-verification.
Multi-agent aware -- Trust profiles track each agent independently. A reckless agent's memories rank lower.

Get Started in 60 Seconds

Install ALMA and start giving your agents memory.

Install
pip install alma-memory[local] # Includes SQLite + FAISS + local embeddings
from alma import ALMA

alma = ALMA.from_config(".alma/config.yaml")

# Retrieve what the agent learned
memories = alma.retrieve(
    task="Fix the login bug",
    agent="developer",
    top_k=5
)

# Inject into your prompt
prompt = f"""## Context from past runs
{memories.to_prompt()}

## Task
Fix the login bug"""

# After the task, learn from the outcome
alma.learn(
    agent="developer",
    task="Fix login bug",
    outcome="success",
    strategy_used="Cleared session cache"
)

# That's it. Every run gets smarter.
Other installation options (PostgreSQL, Qdrant, Pinecone, Chroma, Azure, all)
pip install alma-memory[postgres]  # PostgreSQL + pgvector
pip install alma-memory[qdrant]    # Qdrant
pip install alma-memory[pinecone]  # Pinecone
pip install alma-memory[chroma]    # ChromaDB
pip install alma-memory[azure]     # Azure Cosmos DB
pip install alma-memory[all]       # Everything

At a Glance

0.964
LongMemEval R@5
#1 open-source
2,121
Tests Passing
7
Storage Backends
$0
Cost (local)
0
API Keys Needed
5
Memory Types
22
MCP Tools
6
Chat Formats
4
Graph Backends
5
Trust Dimensions
4
Verification Statuses
<5 min
Time to First Memory

Database Setup

Three paths. Pick what fits. ALMA auto-creates tables on first run.

SQLite + FAISS

Zero config

No database to install. Files stored locally. Perfect for development and personal use.

storage: sqlite
storage_dir: .alma
# That's it. Done.
Cost: $0.00 forever

PostgreSQL + pgvector

Production

Full SQL with vector search. Complete schema in the README -- copy-paste and go.

storage: postgresql
connection_string:
  ${DATABASE_URL}
Cost: varies by provider

Supabase Free Tier

Cloud hosted

Free PostgreSQL + pgvector. Create account, run SQL from README, configure YAML. Done.

# 1. supabase.com/dashboard
# 2. Run SQL from README
# 3. Copy connection string
Cost: $0.00 (free tier)
Coming Soon

Veritas Cloud

ALMA's trust scoring and verified retrieval work great on a single instance. But when you run dozens of agents across multiple deployments, you need a shared source of trust.

ALMA + Veritas (Free)

Open source, MIT license, yours forever
  • Trust scoring — per-agent trust profiles, 5 behavioral dimensions, trust decay
  • Verified retrieval — two-stage verification, 4 statuses, conflict detection
  • Anti-pattern memory — agents remember what failed and why
  • Retrieval feedback loop — memories that agents actually use get scored higher
  • 7 storage backends, 4 graph backends, 22 MCP tools
pip install alma-memory
COMING SOON

Veritas Cloud (Pro)

Managed trust service for multi-agent teams
  • Real-time conflict prevention — stop Agent B before it acts on data Agent A already invalidated
  • Shared trust graph — one source of truth across all your agent deployments
  • Trust dashboard — conflicts/week, trust trends, resolution rates at a glance
  • Provenance chain API — full audit trail for every agent decision, compliance-ready
  • Monthly value report — "Veritas prevented X conflicts, saved an estimated $Y"
Join the early access list below

Does this sound like your team?

If you answer yes to any of these, Veritas Cloud is being built for you.

1

"Our agents sometimes contradict each other and we only find out when a customer complains."

Veritas Cloud catches conflicts in real-time, before agents act on contradicting data.

2

"We can't prove to our clients that our AI agents are making trustworthy decisions."

Provenance chain API gives you a full audit trail. Show clients exactly how every decision was made.

3

"We run 50+ agent workflows and have no idea how many conflicts happen per week."

Trust dashboard shows trust violations, resolution rates, and memory accuracy across your entire fleet.

4

"Each agent deployment is isolated. Trust built in one workflow doesn't carry over to another."

Shared trust graph — trust scores, provenance data, and conflict history unified across all deployments.

5

"Our enterprise clients are starting to ask about AI compliance and audit trails."

SOC2-ready audit exports, per-tenant trust isolation, and SLA guarantees. Built for enterprise compliance.

Get Early Access

We're building Veritas Cloud with design partners. If multi-agent trust is a pain point for your team, we want to hear from you.

Request Early Access

No commitment. Tell us about your agent setup and we'll figure out if Veritas Cloud can help.

Stop Starting From Zero

Every conversation makes the next one better. One pip install. Five minutes. $0.00 to start. No API keys.

Support continued development: Buy me a coffee