Skip to content

Architecture — Detailed File Map

This is the full per-file reference for the codebase, split out of CLAUDE.md to keep that file lean. See CLAUDE.md for the high-level architecture, key patterns, configuration, and conventions.

Paths below are relative to the package root src/mnemoai/ (e.g. client/agent/agent.py is src/mnemoai/client/agent/agent.py).

Detailed File Map

Entry Point

File Purpose
main.py CLI entry point (cli(); also python -m mnemoai). Parses --no-verbose, creates LangGraphClient(verbose=…), starts it, creates ChatInterface, runs chat loop.

client/ — Agent, Routing, Orchestration, UI

File Purpose
client/__init__.py Exports LangGraphClient
client/client.py Central orchestration class. Manages full lifecycle: MCP connection, LLM init, agent creation, query processing, episodic memory injection, playbook context injection, conversation save/load, RAG session, chunk cache, context clearing. Key methods: start(), query(), clear_context(), save_conversation(), load_conversation(), reflect_and_learn(), _inject_episodic_context(), _inject_playbook_context(), _inject_memory_context() (injects the curated MEMORY.md into the system prompt at session start). Holds plan_mode_active (toggled by /plan), passed to the agent as a plan_mode_provider callback; query() prepends a read-only banner each turn while it's on.
client/agent/agent.py LangGraph StateGraph agent. Builds graph with nodes: classifier, orchestrator, agent (call_model), tools (execute_tools). Handles streaming output with CodeFormatter, thinking/reasoning extraction, retry on empty content, and prints a gray [⚙ tool(args)] marker per tool call (_format_tool_call()). Key class: LangGraphAgent with invoke(), _build_graph(), _call_model(), _execute_tools(), _stream_response(), _classify(), _orchestrate(), _run_worker_loop(), _aggregate_results(). External (mcp.json) tools are tracked in self.external_tools, appended to every non-empty route, and surfaced to the decomposer via _external_tools_prompt_block() (routes such subtasks to full). _confirm_tool() hard-gates destructive tools client-side (execute_bash, fs_write/file_edit) and also the memory tool when REQUIRE_MEMORY_CONFIRMATION is true (default false). Enforced plan mode: _is_blocked_by_plan_mode() (driven by a plan_mode_provider callback wired to client.plan_mode_active) hard-blocks _PLAN_BLOCKED_TOOLS (execute_bash, fs_write, file_edit, git_safe, git_commit_safe, start_background_task) at both chokepoints, above _confirm_tool.
client/mcp_tool_wrapper.py MCP↔LangChain bridge. Runs background asyncio event loop in daemon thread for persistent MCP server connection. MCPClientWrapper manages one server's lifecycle; MCPToolWrapper(BaseTool) wraps individual MCP tools with Pydantic arg schemas (calls the server by mcp_tool.name, so namespaced display names still route correctly). MultiMCPClient owns the built-in wrapper plus one per external server, connects/exits them together, and merges their tools — namespacing a colliding external tool as servername__tool (built-in names win). Sync wrappers use run_coroutine_threadsafe.
client/mcp_config.py External MCP server loader. load_external_servers() reads ~/.mnemoai/mcp/mcp.json (Claude Code/kiro mcpServers schema; legacy flat ~/.mnemoai/mcp.json fallback) into ExternalServer(name, StdioServerParameters) tuples via _parse_entry(). Tolerant: a missing/malformed file or a bad/disabled entry is skipped (red-logged), never crashing startup. env is merged over the process environment.
client/agent/router.py Query classifier. QueryRouter.classify() sends query + ROUTING_PROMPT to LLM, returns one of: simple_qa, code, research, knowledge, full. ROUTE_TOOLS dict maps each route to allowed tool names.
client/agent/orchestrator.py Task decomposition. get_orchestrator_prompt() and get_aggregator_prompt() load prompts from prompts.yaml (config.prompt(...)). parse_subtasks(content, fallback_query, valid_categories) robustly parses JSON subtask list from LLM output with multiple fallback strategies.
client/agent/reasoning_utils.py Shared reasoning-model helpers. disable_reasoning(model) / restore_reasoning(model, saved) temporarily turn off thinking for auxiliary LLM calls (router classification, task decomposition) so output lands in response.content instead of the reasoning field. extract_visible_text(content) strips <think> tags / Bedrock thinking blocks. Used by agent.py and router.py.

client/managers/ — Conversation & Profile Management

File Purpose
client/managers/agent_conversation_manager.py Conversation compaction (token counting + LLM summarization). Auto-compacts when over MAX_CONVERSATION_TOKENS; compact() is the manual /compact path. Summarizes older messages into the system prompt while keeping recent turns verbatim — the kept window is bounded by BOTH message count (KEEP_RECENT_MESSAGES / MANUAL_COMPACT_KEEP_RECENT) and a token budget (KEEP_RECENT_TOKEN_BUDGET, default 25% of max), so an oversized recent message is summarized rather than kept. Preserves tool calls/results in the summary, which uses a structured prompt (summarizer system framing + 9-section <analysis>-then-summary template; /compact <focus> injected as compact instructions; <analysis> stripped; continuation instruction added). The split is tool-pair-safe (_safe_tool_boundary) so the kept window never starts on an orphaned tool result (which the OpenAI Responses API rejects). Key methods: count_tokens(), generate_summary(), _build_summary_prompt(), _strip_analysis(), manage_messages(), compact(), _compact(), _split_keep_recent(), _safe_tool_boundary().
client/managers/user_profile_manager.py Learns user preferences via Exponential Moving Average (EMA). Tracks: verbosity, directness, technical level, abstraction preference, top domains, tool success per intent. Generates compact profile summary for system prompt injection. Persists as JSON at ~/.mnemoai/{profile}/. Key class: UserProfileManager with analyze_conversation(), classify_intent(), get_profile_summary().

client/memory/ — Episodic Memory & Learning

File Purpose
client/memory/episodic_memory.py High-level episodic memory manager. Stores successful task patterns with tool usage. Supports query expansion (synonyms). Delegates to ChromaDB or FAISS store. Key functions: is_task_successful() (heuristic checks success/correction/error markers), extract_tools_from_messages(). Key class: EpisodicMemoryManager with store(), retrieve(), clear().
client/memory/memory_store.py Curated persistent memory store. MemoryStore with read/add/replace/remove over a MEMORY.md whose entries are separated by a Markdown --- rule (legacy § files still parse + migrate); char-capped (consolidate-on-overflow). Shared by the MCP memory tool and the /memory command.
client/memory/chroma_store.py ChromaDB-backed episodic store with hybrid search (semantic from ChromaDB + BM25 re-ranking). Key class: ChromaEpisodicStore with add(), search(), cleanup().
client/memory/faiss_store.py FAISS-backed episodic store (alternative to ChromaDB). Same hybrid search pattern. Key class: FAISSEpisodicStore with add(), search(), cleanup(), clear().
client/memory/reflector.py ACE Reflector. Analyzes tool execution trajectories after each interaction. Detects failures (string_not_found, file_not_found, permission_denied, syntax_error, timeout, api_error). Extracts reusable strategies as PlaybookEntry objects. Tracks metrics (total/successful/failed calls, failure types, daily stats). Key class: Reflector with analyze_tool_execution(), reflect_on_trajectory().
client/memory/playbook_store.py ACE Playbook. Append-only store for learned strategies with lazy semantic deduplication. Retrieves relevant entries by task context for system prompt injection. Key class: PlaybookStore with append(), append_batch(), get_relevant_entries(), format_for_prompt(), _refine() (triggers when over max_entries). Persists at ~/.mnemoai/{profile}/models/{model}/playbook/playbook.json (model-scoped).

client/ui/ — User Interface

File Purpose
client/ui/__init__.py Package marker
client/ui/chat_interface.py Interactive CLI using prompt_toolkit. Multiline input (Ctrl+J), slash-command autocomplete, commands: /clear, /load, /save, /exit, /quit, /compact [focus] (manual context compaction), /config (re-run the configurator via run_reconfigure(); overwrites config.yaml), /model (override one model section — LLM/vision/embeddings — via run_model_override()), /params (tune a model's inference params via run_params_override()), /mcp (list configured MCP servers + tool counts via _print_mcp_status()), /memory (view the curated MEMORY.md; /memory clear wipes it with a y/N confirm, via _handle_memory_command()), /plan (toggle enforced read-only plan mode — flips client.plan_mode_active, which the agent reads to hard-block mutating/exec tools). /config, /model, /params call _restart_in_place(), which re-execs the process via os.execv so all settings — incl. MCP tool toggles decided at subprocess boot — take effect. Handles episodic memory storage (immediate and delayed modes), ACE reflection triggers, double Ctrl+C exit. Key class: ChatInterface with run_chat_loop(), get_multiline_input().
client/ui/spinner.py Threaded "Thinking..." spinner animation shown during LLM processing. Key class: Spinner with start(), stop().

server/ — MCP Server & Tools

File Purpose
server/__init__.py Package marker
server/server.py MCP server entry point (run as subprocess). Creates FastMCP("MCP Server"), calls register_tools(mcp), runs via stdio transport.
server/error_handler.py @tool_error_handler decorator (shared by all tools). Catches typed exceptions (FileNotFoundError, PermissionError, etc.) and returns structured JSON with error_type, message, next_steps.

server/tools/ — Tool Implementations

File Purpose
server/tools/__init__.py Creates global ToolManager singleton. Exports register_tools, validate_file_path, count_tokens, vision_model, vision_model_controller.
server/tools/tools_manager.py Central tool management. Initializes vision model, provides token counting, file path validation. register_tools(mcp) conditionally registers all tool categories based on config toggles.
server/tools/execute_bash.py execute_bash(command, timeout=30) — safe shell execution with timeout. Blocks dangerous commands (rm, mkfs, dd, shutdown, chmod 777). Returns stdout/stderr/exit_status.
server/tools/file_edit.py file_edit(file_path, old_string, new_string, replace_all=False) — precise string replacement. Validates file exists, checks for unique matches.
server/tools/file_search.py glob_search(pattern, path, max_results, sort_by_mtime) — file name search. grep_search(pattern, path, file_pattern, case_insensitive, output_mode, context_lines, max_results) — content search via ripgrep.
server/tools/fs_read.py fs_read(path, mode, start_line, end_line, pattern, context_lines, depth) — multi-mode reader. Modes: Line, Search, Directory, CSV, JSON, JSONL, PDF, DOCX. Delegates to readers/.
server/tools/fs_write.py fs_write(path, command, ...) — two-step write with confirmation (dry_run=True for preview, then dry_run=False + confirmed=True). Commands: create, str_replace, insert, append.
server/tools/git_safety.py git_safe(command, allow_dangerous, reason), git_status_safe(), git_commit_safe(message, add_all, add_files, amend, allow_empty) — git with safety checks. Blocks force push to main/master, warns about hard resets.
server/tools/describe_image.py describe_image(image_path, question) — sends base64 image to vision model for description.
server/tools/plan_mode.py Multi-step planning workflow. Tools: enter_plan_mode(), add_plan_step(), add_plan_file(), add_plan_risk(), present_plan(), approve_plan(), exit_plan_mode(), get_plan_status(). Persists at ~/.mnemoai/plans/current_plan.json.
server/tools/background_tasks.py Background task execution in threads. Tools: start_background_task(), get_task_status(), get_task_output(), list_background_tasks(), cancel_background_task(), wait_for_task(), clear_completed_tasks(). Output at ~/.mnemoai/tasks/.
server/tools/todo_manager.py Task tracking. Enforces one in_progress task at a time. Tools: todo_write(todos), todo_read(), todo_clear(). Persists at ~/.mnemoai/{profile}/todos/current_todos.json.
server/tools/web_crawler.py web_crawler(url) — extracts page content as markdown via crawl4ai. Optionally ingests large pages into RAG store.
server/tools/web_search.py web_search(query, search_lang, num_results) — internet search via Brave Search API. Returns structured results.
server/tools/rag_tool.py MCP-exposed RAG tools: list_documents(), search_in_documents(query, top_k), clear_documents().
server/tools/memory_tool.py register_memory_tools(mcp) exposing the memory(action, text, old_text) tool; delegates to MemoryStore. Gated by ENABLE_MEMORY.

server/tools/rag/ — RAG Engine

File Purpose
server/tools/rag/__init__.py Exports get_rag_session, reset_session_rag, SessionRAG, FaissStore, create_store, register_rag_tools.
server/tools/rag/session.py Core RAG engine. Session-scoped vector store with hybrid search (semantic + BM25). SessionRAG class with ingest(doc_id, content, chunk_size_tokens) and query(query_text, top_k). Cross-process session sharing via file-based session_id. Functions: get_rag_session(), set_rag_session(), reset_session_rag().
server/tools/rag/vector_store_controller.py Abstraction over FAISS/ChromaDB backends. Factory pattern via VectorStoreController with add(), search(), clear(), detect_existing_store() (static).
server/tools/rag/faiss_store.py FAISS IndexFlatIP (cosine similarity on L2-normalized vectors). Thread-safe with threading.Lock(). File persistence (faiss index + pickle metadata). Key class: FaissStore.
server/tools/rag/chroma_store.py ChromaDB-backed store alternative. Persistent client with automatic collection management. Key class: ChromaStore.

server/tools/readers/ — File Format Readers

File Purpose
server/tools/readers/__init__.py Exports all readers
server/tools/readers/chunking_helper.py Universal chunking + LLM summarization for large files. Recursive splitting with 10% overlap. SQLite chunk cache. Concurrent summarization with asyncio semaphore. Key functions: process_large_content(), reset_session_chunk_cache().
server/tools/readers/line_reader.py read_lines(path, start_line, end_line) — line-based file reading with token limit.
server/tools/readers/directory_reader.py read_directory(path, depth) — recursive directory listing.
server/tools/readers/csv_reader.py read_csv(path) — CSV with auto-delimiter detection, encoding fallbacks, token truncation.
server/tools/readers/json_reader.py read_json(path, start_line, end_line) — JSON/JSONL reading with line ranges and token limits.
server/tools/readers/pdf_reader.py read_pdf(file_path) — PyPDF2 reader. Large PDFs auto-ingest into RAG if enabled, else chunk+summarize.
server/tools/readers/docx_reader.py read_docx(file_path) — python-docx reader. Same RAG/chunking fallback as PDF.
server/tools/readers/search_reader.py search_file(path, pattern, context_lines) — regex search within a single file with context.

models/ — LLM Provider Abstraction

File Purpose
models/__init__.py Empty package marker
models/controllers/base_model_controller.py Minimal shared base type for the controllers. Per-provider inference-param handling lives in models/provider_params.py (consumed via build_kwargs), not here.
models/provider_params.py Single source of truth for which config keys each provider consumes, per section (LLM_SUPPORTED_PARAMS / VISION_SUPPORTED_PARAMS / EMBED_SUPPORTED_PARAMS), mirroring the controller init methods. supported_params(section) returns the per-provider key registry. The configurator uses it to prune unsupported keys on a /model provider switch; keep in sync when a controller starts/stops reading a key. Also exposes extra_params(model_id) — the generic EXTRA_PARAMS passthrough every provider accepts (a raw dict merged verbatim into the model's request body / model_kwargs); it's always in supported_keys (never pruned) but excluded from tunable_params (not a /params scalar).
models/controllers/llm_controller.py Primary LLM controller. LangChainLLMController(BaseModelController) reads MODEL_ID config, initializes the correct LangChain chat model. Methods: initialize_model(), get_model(), get_model_type(). Supports: bedrock (ChatBedrockConverse), mantle (Bedrock Mantle, via mantle_factory), ollama (ChatOllamaWrapper), openai (ChatOpenAI), anthropic (direct Anthropic API via ChatAnthropic — distinct from Mantle's anthropic protocol), sagemaker (ChatSageMaker), litellm (ChatLiteLLM). Handles extended thinking (Bedrock & direct Anthropic Claude), Ollama reasoning, OpenAI reasoning_effort. Note: temperature only sent when explicitly configured (newer Claude models reject it); Anthropic STOP maps to stop_sequences and requires max_tokens (defaults to 4096). Optional ENDPOINT_URL overrides the Bedrock endpoint (Anthropic: custom base URL).
models/mantle_factory.py Bedrock Mantle model factory. build_mantle_model(model_id, ...) returns the right LangChain model for API_PROTOCOL: chat_completions (ChatOpenAI, /v1), responses (ChatOpenAI use_responses_api=True, /openai/v1), anthropic (ChatAnthropic, /anthropic). Auth: uses a Bedrock API key when present (MODEL_ID.API_KEY or the BEDROCK_API_KEY env var), else mints a short-lived bearer token via aws_bedrock_token_generator.provide_token(). Used by both the LLM and vision controllers.
models/controllers/embeddings_controller.py Multi-provider embeddings with LRU caching. Supports Ollama, Bedrock, OpenAI, SageMaker, LiteLLM. Falls back to SHA256-based deterministic embeddings on failure. Key class: EmbeddingsController with embed(texts) → numpy array.
models/controllers/vision_model_controller.py Vision model controller. VisionModelController(BaseModelController) with describe_image(), format_request() (multimodal HumanMessage with base64), and _content_to_text() (normalizes string/list-of-blocks responses). Supports Bedrock, Mantle (all 3 protocols), Ollama, OpenAI, Anthropic (direct Claude API via ChatAnthropic), SageMaker (reuses ChatSageMaker, openai_chat format), LiteLLM (reuses ChatLiteLLM). All paths consume the same OpenAI image_url content from format_request().
models/chat_models/__init__.py Empty
models/chat_models/chat_ollama_wrapper.py ChatOllamaWrapper(ChatOllama) — extends ChatOllama to add presence_penalty and frequency_penalty support in the options dict.
models/chat_models/sagemaker_chat.py ChatSageMaker(BaseChatModel) — full LangChain BaseChatModel for SageMaker endpoints. Supports OpenAI chat format and HuggingFace text_generation format. Implements _generate() and _stream() (SSE parsing). Handles reasoning/thinking tags. bind_tools() support.

utils/ — Shared Utilities

File Purpose
utils/__init__.py Package marker
utils/paths.py Central path helper — single source of truth for all runtime locations. app_home() (defaults to ~/.mnemoai, honors $MNEMOAI_HOME), config_dir() (→ config/) + config_path() (→ config/config.yaml) + legacy_config_path() (flat fallback), mcp_dir() (→ mcp/) + mcp_config_path() (→ mcp/mcp.json) + legacy_mcp_config_path() (flat fallback), seed_example_files() (idempotently copies bundled config.yaml*.example/mcp.json.example into config/+mcp/, never overwriting), plans_dir(), tasks_dir(), profile_dir(profile=None), model_dir(model_name, profile=None), memory_file_path(profile=None) (→ {profile}/MEMORY.md), sanitize_model_name(name). Every call site that touches the home dir routes through here. Lazy-imports utils.config to avoid a cycle.
utils/config.py Singleton config manager. Loads config via _resolve_config_path() ($MNEMOAI_CONFIG<app_home>/config/config.yaml → legacy flat <app_home>/config.yaml → package utils/config.yaml fallback; prints a copy-a-template hint if none found). Loads prompts separately via _resolve_prompts_path() ($MNEMOAI_PROMPTS<app_home>/config/prompts.yaml → package utils/prompts.yaml). Exposes .get("SECTION.KEY", default) for config, .prompt("KEY", default) for prompts (SYSTEM/ROUTING/ORCHESTRATOR/AGGREGATOR/SUMMARY_SYSTEM/SUMMARY_TASK), .system_prompt property (reads SYSTEM_PROMPT from prompts.yaml), and reload() (re-reads both files into the existing singleton). Prompt keys still in config.yaml are ignored with a one-time migration warning. Sets env vars from ENV section.
utils/configurator.py First-run interactive setup. When no config resolves, cli() (in main.py) runs run_first_run_setup() on a TTY: picks a provider (Ollama/Bedrock/Mantle have dedicated templates; OpenAI/SageMaker/LiteLLM reuse the base template, transformed — TYPE set + unsupported keys pruned + provider connection keys prompted) and prompts for chat model + connection + optional max output tokens (MAX_TOKENS; none/blank drops it) + mandatory max context window (MAX_CONVERSATION_TOKENS, default 65536), vision model (mirrors chat host/region, own optional max output tokens), profile, Brave key, and each feature toggle. Patches them in via line-targeted edits (_set_in_section/_set_top_level/_set_bool) — reading current defaults with _get_in_section/_get_top_level — so the rich prompt blocks/comments survive. Writes <app_home>/config/config.yaml, then the caller calls config.reload(). config_exists() gates the trigger. run_params_override() (the /params command) tunes a configured model's inference parameters (temperature, top_p, penalties, reasoning, stop, stream, …) — only the keys the model's provider accepts, per provider_params.tunable_params(). run_model_override() (the /model command) edits just one model section in place — chat/vision/embeddings (embeddings offered only when configured) — using depth-agnostic helpers (_get_field/_set_field/_remove_field) that reach the nested RAG.EMBED_MODEL_ID. Both /config and /model prompt connection/auth via the SAME _prompt_provider_connection() helper (section-aware via the provider_params registry: HOST/PORT for ollama, REGION for AWS, Mantle protocol, SageMaker INPUT_FORMAT for chat/vision only, LiteLLM API_BASE/API_KEY; OpenAI is env-based), so the two flows always ask the same mandatory params. Switching a section's provider prunes every key the new provider doesn't consume — connection, auth, and inference alike (e.g. REGION/API_PROTOCOL after mantle→ollama, HOST/PORT/TOP_K/penalties after ollama→openai) — using the supported-key registry in models/provider_params.py (the single source of truth, derived from the controller init methods). Additionally, on ANY model change /model calls _clear_inference_params() to drop model-specific generation params (temperature, top_p, penalties, reasoning, stop, stream — everything in tunable_params except the separately-prompted MAX_TOKENS), so a value tuned for one model isn't carried into another that may reject it (e.g. newer Claude/GPT reject temperature); the new model's defaults apply until the user re-tunes via /params. STOP is kept in the example template (documentation) but never written into a generated config.
utils/logger.py Logger setup. Configurable via LOG_LEVEL env var (default WARNING). Suppresses noisy Brave Search logs. Exports logger singleton.
utils/console.py User-facing console output, distinct from logger diagnostics: print_error(msg) (red, ✗-prefixed) and print_success(msg) (green).
utils/bm25.py Lightweight BM25 (Okapi BM25) implementation. BM25 class with fit(corpus) and score(query). tokenize() function (regex word tokenizer). Used by episodic memory and RAG hybrid search.
utils/formatting/__init__.py Exports make_urls_clickable and everything from response_parser.
utils/formatting/code_formatter.py Real-time streaming syntax highlighting. Handles triple-backtick code blocks (Pygments language detection) and inline code. CodeFormatter with process_chunk() and flush().
utils/formatting/response_parser.py Extracts structured content from AI responses. Functions: extract_answer() (from <answer> tags), extract_thinking() (from <think>/<thinking> tags), format_response().
utils/formatting/url_formatter.py Makes URLs clickable in terminal (ANSI escapes for iTerm/VSCode, fallback to color). Handles plain URLs and markdown links. Functions: make_urls_clickable(), highlight_urls(), format_url().