Advanced Features¶
📚 Advanced Features¶
Query Routing¶
When enabled, the assistant classifies each query before processing it and routes it to a specialized tool subset. This reduces noise for the model and improves response quality.
Categories:
| Route | Description | Tools Available |
|---|---|---|
simple_qa |
Greetings, explanations, general knowledge | None (direct LLM answer) |
code |
File ops, code editing, git, shell commands | fs_read, fs_write, file_edit, bash, git, search, etc |
research |
Web search, URL fetching | web_search, web_crawler |
knowledge |
Document reading, indexing, RAG queries | pdf/csv/docx/json readers, RAG tools, fs_read |
full |
Multi-category or ambiguous tasks | All tools (fallback) |
How it works:
- A lightweight LLM call classifies the query into one of the categories above
- The agent node binds only the tools for that category
- If a query spans multiple categories, it routes to
full(all tools) - The classifier prompt is customizable via
ROUTING_PROMPTinprompts.yaml(all prompts live there, separate fromconfig.yaml)
Configuration:
# config.yaml — toggle
ENABLE_ROUTING: true
# prompts.yaml — the classifier prompt
ROUTING_PROMPT: |
# Custom classifier prompt (optional, has a sensible default)
...
Orchestrator-Workers¶
When enabled alongside routing, tasks classified as full (spanning multiple categories) are automatically decomposed into focused subtasks executed by specialized workers.
How it works:
- Orchestrator: An LLM call decomposes the complex query into ordered subtasks, each assigned a category (code, research, knowledge, etc.)
- Workers: Each subtask is executed by a worker agent with only the tools for its category. Workers run sequentially — each receives context from previously completed subtasks.
- Aggregator: If there were multiple subtasks, a final LLM call synthesizes all worker results into a single coherent response.
Example flow for "Read this PDF and write a summary to a file":
Orchestrator decomposes into:
[Step 1/2: Read and summarize the PDF document] → knowledge worker
[Step 2/2: Write the summary to summary.md] → code worker
[Synthesizing results...] → aggregator
Configuration:
# config.yaml — toggles
ENABLE_ROUTING: true # Required
ENABLE_ORCHESTRATION: true # Activates orchestrator for 'full' route
# prompts.yaml — customize the prompts (optional; sensible defaults bundled)
# ORCHESTRATOR_PROMPT: | # decomposition prompt
# AGGREGATOR_PROMPT: | # synthesis prompt
When orchestration is disabled, full routes use all tools in a single agent loop (the previous behavior). No regression.
Web Search Configuration¶
This tool uses the Brave Search API. Obtain an API key from Brave Search Developer Portal.
Web Crawler Configuration¶
Enable web page content extraction with automatic RAG integration:
When enabled, the web_crawler tool:
- Extracts content from web pages as markdown
- Automatically ingests large pages (>8K tokens) into RAG (if enabled)
- Uses the same chunking configuration as PDF/DOCX readers
Browser dependency. Crawling uses a headless Chromium via Playwright, whose browser binary is a separate ~260MB download not pulled in by
pip/uv tool install. The tool installs it automatically on the first crawl after a fresh install/upgrade. If that auto-install fails (e.g. offline), run it manually in the same environment:python -m playwright install chromium(for an installed CLI:~/.local/share/uv/tools/mnemoai/bin/python -m playwright install chromium).
External MCP Servers¶
mnemoai always runs its own built-in MCP server (file ops, bash, git, web, RAG,
vision, planning). You can add more MCP servers by creating
~/.mnemoai/mcp/mcp.json with the standard mcpServers schema (an
mcp.json.example is seeded there on first run). Their tools are merged with the
built-in ones and made available to the agent.
{
"mcpServers": {
"brave-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": { "BRAVE_API_KEY": "your_brave_api_key" }
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
"disabled": true
}
}
}
Per-server fields: command (required), args (optional list), env
(optional; merged over the process environment), and disabled (optional;
true skips the server). A template ships at
~/.mnemoai/mcp/mcp.json.example (seeded on first run from the bundled
src/mnemoai/utils/mcp.json.example).
Behavior:
- Additive — the built-in server is always on; external servers run alongside it. Tools from all servers are merged into one list.
- Resilient — if an external server fails to start (bad command, missing binary, crash), it's logged in red and skipped; the app still runs with the built-in server and any others that connected.
- No shadowing — if an external tool's name collides with a built-in one,
the external tool is exposed as
servername__toolso core tools are never overridden (the server is still called with the original tool name). - Works with routing & orchestration — external tools are appended to every
non-empty query route, and when orchestration is enabled the task decomposer
is told which external tools exist and steers subtasks that need them to the
fullcategory (which binds every tool). So external tools stay reachable whether routing/orchestration is on or off. - Run
/mcpin the chat to see configured servers, status, and tool counts.
RAG (Retrieval-Augmented Generation)¶
The RAG system automatically indexes documents for semantic search with hybrid search (semantic embeddings + BM25 keyword scoring).
How it works:
- Read a PDF/DOCX file → Automatically chunked and indexed
- Ask questions → Assistant searches indexed documents first using hybrid search
- Session-scoped → Cleared on
/clearor exit
RAG Tools:
list_documents(): Show indexed documentssearch_in_documents(query, top_k): Hybrid semantic + BM25 searchclear_documents(): Clear RAG index
Configuration:
RAG.CHUNK_TOKENS: Chunk size (recommended: 512-2048)RAG.VECTOR_STORE.TYPE: Choose betweenfaissorchromadbRAG.SEARCH.SEMANTIC_WEIGHT/RAG.SEARCH.KEYWORD_WEIGHT: Configurable hybrid weights- Recursive chunking with 10% overlap
- Hybrid search: BM25 (Okapi BM25 with TF-IDF, term saturation, length normalization) + semantic similarity
- Independent candidate retrieval from both BM25 and embeddings, merged and re-ranked
Vector Store Options:
- ChromaDB: Persistent vector database with metadata support (default)
- FAISS: Fast in-memory search with disk persistence
The system uses a VectorStoreController for easy switching between stores. All functionality (indexing, searching, clearing) works identically regardless of the chosen store.
User Profile Learning¶
After 5+ interactions, the assistant builds a profile:
- Cognitive style: Analytical, creative, pragmatic, systematic
- Domain expertise: Python, AWS, DevOps, ML, etc.
- Learning style: Visual, hands-on, theoretical
- Communication patterns: Tone, complexity, question styles
- Code preferences: Testing, documentation, type hints
Profile is automatically injected into system prompt for personalization.
🧠Persistent Memory (MEMORY.md)¶
A small, agent-curated markdown file the assistant maintains itself to remember durable facts across sessions — user/environment details, conventions, lessons learned, tool quirks, and completed work. It lives at ~/.mnemoai/{profile}/MEMORY.md (profile-scoped, shared across models, unlike episodic memory and the playbook).
How it works:
- Always injected: The entire file is loaded into the system prompt at the start of every session — a "frozen snapshot". Writes made during a session take effect on the next session, not the current one.
- Agent-curated: The assistant edits its own memory via the
memoryMCP tool (add/replace/removeactions over an entry list separated by a Markdown---rule), deciding what is worth remembering. - Bounded: A hard character cap (
MEMORY.MAX_CHARS, default 2200) forces the agent to consolidate — merging or removing stale entries instead of growing unbounded.
How it differs from Episodic Memory: persistent memory is a curated set of facts that is always in context, whereas episodic memory is a store of past task completions retrieved by similarity per query. The two complement each other (and the ACE Playbook, which stores tool strategies).
Command: Run /memory to view the current memory, or /memory clear to wipe it (with a y/N confirm).
Configuration:
ENABLE_MEMORY: true # Master toggle for the memory tool + injection
REQUIRE_MEMORY_CONFIRMATION: false # Auto-saves like Hermes; set true to require y/N before each memory write
MEMORY:
MAX_CHARS: 2200 # Hard cap — forces consolidation when exceeded
REQUIRE_MEMORY_CONFIRMATION defaults to false (the agent auto-saves). Set it to true to gate each memory write behind a y/N prompt, reusing the same client-side confirmation gate as bash/file writes.
Storage Location: ~/.mnemoai/{profile}/MEMORY.md
Episodic Memory¶
The episodic memory system learns from successful task completions and retrieves similar solutions for future queries.
How it works:
- Automatic Storage: After each successful interaction, stores:
- Initial user query
- Full conversation context
- Tools used with arguments
- Final solution
-
Timestamp
-
Hybrid Search: Retrieves similar episodes using:
- 70% semantic similarity (task intent)
-
30% BM25 keyword scoring (tool names, action verbs)
-
Context Injection: Before processing queries, injects compact context:
[Episodic Memory - Similar Past Tasks]
1. "read DOCX about ML" → fs_read → success (similarity: 0.85)
2. "analyze PDF report" → fs_read, web_search → success (similarity: 0.78)
- Automatic Cleanup: Maintains bounded memory:
- Max 1000 episodes
- Removes entries older than 90 days
- Runs on startup
Success Detection:
- User feedback: "thanks", "perfect", "great", "worked"
- No error markers in response
- All tools executed successfully
- Filters out greetings and simple acknowledgments (<300 chars, no tools)
Storage Location:
- FAISS:
~/.mnemoai/{profile}/models/{model}/episodic_memory/episodic.index - ChromaDB:
~/.mnemoai/{profile}/models/{model}/episodic_memory/
Configuration:
ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
STORE_TYPE: chromadb # or faiss
RAG:
EMBED_MODEL_ID: # Required for both stores
NAME: mxbai-embed-large
TYPE: ollama
ACE Playbook (Agentic Context Engineering)¶
The ACE Playbook learns strategies from both successes AND failures, implementing the Agentic Context Engineering framework for continuous improvement.
How it works:
- Reflector: After each interaction, analyzes tool executions:
- Detects failure patterns (file not found, string not found, permission denied, etc.)
- Identifies successful strategies for specific tools (file_edit, execute_bash)
- Extracts specific, actionable insights (not generic summaries)
-
Tracks metrics (success/failure rates, failure types) in
metrics.json -
Playbook Store: Maintains structured strategy entries:
{
"context": "editing python files",
"strategy": "Read the file first to get exact string including whitespace before using str_replace",
"source": "Failed file_edit on 2026-02-01: string_not_found",
"outcome": "failure",
"tools": ["file_edit"],
"confidence": 0.9
}
- Context Injection: Injects relevant strategies into the system prompt at startup:
[Playbook - Learned Strategies]
Avoid these patterns:
✗ [editing files]: Read the file first to get exact string before str_replace
Effective strategies:
✓ [searching files]: Use glob_search instead of find for better performance
- Lazy Refinement: Only deduplicates when hitting token limits, using semantic similarity if embeddings are configured.
What gets stored:
- Failures: Specific patterns like
string_not_found,file_not_found,permission_denied,command_failed, etc. - Successes: Only for tools with reusable patterns (file_edit, execute_bash with specific commands)
- Not stored: Generic successes without actionable strategies
Key Differences from Episodic Memory:
| Feature | Episodic Memory | ACE Playbook |
|---|---|---|
| Stores | Full task completions | Granular strategies |
| Learns from | Successes only | Successes AND failures |
| Format | Conversation context | Structured rules |
| Retrieval | Semantic similarity | Context + tool matching |
Configuration:
ENABLE_PLAYBOOK: true
PLAYBOOK:
MAX_ENTRIES: 500 # Maximum entries before refinement
SIMILARITY_THRESHOLD: 0.85 # Threshold for merging similar strategies
MAX_INJECT: 10 # Maximum entries to inject per query
Storage Location:
- Strategies:
~/.mnemoai/{profile}/models/{model}/playbook/playbook.json - Metrics:
~/.mnemoai/{profile}/models/{model}/playbook/metrics.json