Memory for RAG
Use MemLib as the retrieval layer in retrieval-augmented generation
What This Solves
Standard RAG retrieves documents from a vector database. MemLib adds a memory layer on top — it doesn't just find similar text, it understands facts, deduplicates, resolves contradictions, and scores by recency and importance.
This makes it ideal for:
- User-specific RAG — retrieve only what's relevant to this user
- Conversation-aware retrieval —
prepare()analyzes the conversation to generate targeted queries - Living knowledge — facts update when the user says something new, old contradictions are resolved
Pattern 1: Direct Recall
The simplest RAG pattern — use recall() to find relevant memories and inject them into your prompt:
import { MemLib } from "memlib";
const mem = new MemLib({
apiKey: process.env.MEMLIB_API_KEY!,
namespace: "support-bot",
entity: `user-${userId}`,
});
async function generateResponse(userQuery: string) {
// 1. Retrieve relevant memories
const memories = await mem.recall({
query: userQuery,
limit: 5,
});
// 2. Format as context
const context = memories
.map((m) => `- ${m.content} (relevance: ${(m.score * 100).toFixed(0)}%)`)
.join("\n");
// 3. Build prompt with retrieved context
const systemPrompt = `You are a helpful support assistant.
Known facts about this user:
${context || "No prior context available."}
Answer the user's question using the above context when relevant.`;
// 4. Pass to your LLM (any provider)
return callYourLLM(systemPrompt, userQuery);
}When to use: You want full control over how memories are formatted and injected.
Pattern 2: Synthesized Context with prepare()
prepare() does the RAG pipeline for you — it analyzes the conversation, generates multiple search queries, retrieves memories, and synthesizes a single context paragraph:
async function generateResponse(
messages: Array<{ role: "user" | "assistant"; content: string }>,
) {
// 1. Get synthesized context (2 LLM calls)
const { context, memoriesUsed } = await mem.prepare({
messages: messages.map((m) => ({
role: m.role,
content: m.content,
})),
});
// 2. Inject directly into system prompt
const systemPrompt = `You are a helpful assistant.
${context ? `About this user:\n${context}` : ""}`;
// 3. Generate response
return callYourLLM(systemPrompt, messages);
}When to use: You want the most relevant context with minimal code, and you're okay with 2 extra LLM calls.
Pattern 3: Category-Filtered Retrieval
When you know the topic area, filter by category for higher precision:
// User asks about food → retrieve health/preference memories
const healthMemories = await mem.recall({
query: userQuery,
category: "health",
limit: 3,
});
const preferenceMemories = await mem.recall({
query: userQuery,
category: "preference",
limit: 3,
});
// Combine and deduplicate
const allMemories = [...healthMemories, ...preferenceMemories];
const unique = allMemories.filter(
(m, i, arr) => arr.findIndex((x) => x.id === m.id) === i,
);When to use: Your app has domain-specific logic about which types of memories to retrieve.
Ingestion: Store as You Go
To build the knowledge base, store conversational content after each interaction:
// After every user message
await mem.store({
content: userMessage,
source: "conversation",
});The smart store pipeline handles the rest:
- Extracts atomic facts from natural language
- Skips duplicates (similarity > 0.95)
- Resolves contradictions with existing memories
You don't need to preprocess or chunk the text — MemLib does it.
Ingestion: Batch Import
For bulk data (profiles, documents, CSV exports), use batch store:
// Import pre-structured facts
await mem.batchStore({
memories: [
{ content: "Premium subscriber since 2023" },
{ content: "Preferred language: English" },
{ content: "Account region: EU" },
],
});Batch store uses raw storage — no LLM calls. Use it when your data is already structured.
recall() vs prepare() for RAG
recall() | prepare() | |
|---|---|---|
| LLM calls | 0 | 2 |
| Output | Array of memory objects | Single context paragraph |
| Query strategy | Single query | Multi-query (2–4 auto-generated) |
| Best for | Full control, UI display, low latency | System prompt injection, best relevance |
| Cost | Embedding only | Embedding + 2 LLM calls |
Tips
- Store early, recall later. Memories get better over time — the deduplication and conflict resolution ensure quality improves as more data flows in.
- Use
minImportancefor critical operations. SetminImportance: 0.8when retrieving safety-critical info (allergies, permissions). - Check
similarityon individual results. If the highest result hassimilarity < 0.6, the retrieval might not be relevant — handle this gracefully. - Don't over-retrieve. 5–10 memories is usually enough. More context can dilute the signal for your LLM.