Architecture¶
ποΈ Architecture¶
High-Level Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β main.py β
β (Application Entry) β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄ββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β LangGraphClient βββββββββββββΊβ MCP Server β
β (client.py) β β (server.py) β
ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ
β β
ββββββ΄ββββββ βΌ
β β ββββββββββββ
βΌ βΌ β Tools β
ββββββββββ ββββββββββββ ββββββ¬ββββββ
β UI β β Managers β β
ββββββββββ ββββββββββββ ββββββ΄βββββ
β β β β
ββββββ¬ββββββ βΌ βΌ
βΌ ββββββββββββ βββββββ
ββββββββββββ β Readers β β RAG β
βLangGraph β ββββββββββββ βββββββ
β Agent β
ββββββββββββ
Component Breakdown¶
1. Client Layer (client/)¶
The client manages the conversation flow and user interaction.
client.py: Core LangGraph client- Initializes MCP connection
- Manages conversation state
- Handles model configuration
- Coordinates managers (profile, conversation)
agent.py: LangGraph agent implementation- State graph with agent and tools nodes
- Streaming support with reasoning display
- Code syntax highlighting
router.py: Query classifier and routing- Classifies queries into categories (simple_qa, code, research, knowledge, full)
- Routes each category to a specialized tool subset
- Configurable classifier prompt via
ROUTING_PROMPTin config orchestrator.py: Task decomposition and worker orchestration- Decomposes complex tasks into ordered subtasks with category assignments
- Configurable orchestrator and aggregator prompts via config
reasoning_utils.py: Shared reasoning/thinking helpers- Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
- Extracts visible text from
<think>tags and Bedrock thinking blocks mcp_tool_wrapper.py: MCP to LangChain adapter- Wraps MCP tools as LangChain BaseTool
- Handles async/sync conversion
ui/: User interface componentschat_interface.py: Interactive chat loop with command handlingspinner.py: Loading animationsmanagers/: Business logicagent_conversation_manager.py: Conversation state and token trackinguser_profile_manager.py: Automatic user profiling and learning
2. Server Layer (server/)¶
MCP server that provides tools to the LLM.
server.py: FastMCP server initializationerror_handler.py:@tool_error_handlerdecorator (shared by all tools)tools/: Tool implementationstools_manager.py: Centralized tool registration and utilitiesfs_read.py: File reading (text, CSV, JSON, PDF, DOCX)fs_write.py: File writing (dry-run preview); writes are hard-gated client-side byREQUIRE_WRITE_CONFIRMATIONfile_edit.py: Precise string replacement with validation and uniqueness checkingexecute_bash.py: Shell command execution with intelligent error handlingfile_search.py: Fast file/content search (glob patterns + ripgrep)todo_manager.py: Todo list management for multi-step tasksweb_search.py: Brave Search integrationweb_crawler.py: Web page content extraction with RAG integrationdescribe_image.py: Vision model image analysisrag_tool.py: RAG tools registrationrag/: RAG systemsession.py: Session-scoped RAG management with hybrid searchvector_store_controller.py: Vector store abstraction layerfaiss_store.py: FAISS vector store implementationchroma_store.py: ChromaDB vector store implementation
readers/: Specialized file readersline_reader.py,directory_reader.py,search_reader.pycsv_reader.py,json_reader.pypdf_reader.py,docx_reader.pychunking_helper.py: Document chunking for RAG
3. Models Layer (models/)¶
Model controllers and custom implementations.
provider_params.py: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it viabuild_kwargs, and/modelprunes unsupported keys from itmantle_factory.py: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllerscontrollers/(provider-dispatching model initialization):base_model_controller.py: Minimal shared base type for the controllersllm_controller.py: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, Anthropic, SageMaker AI, LiteLLM)vision_model_controller.py: Vision model initializationembeddings_controller.py: Embedding model initialization for RAGchat_models/(concrete LangChainChatModelsubclasses):chat_ollama_wrapper.py: Extends ChatOllama withpresence_penaltyandfrequency_penaltysupportsagemaker_chat.py: Full LangChainBaseChatModelfor SageMaker endpoints (streaming, tool calling, reasoning)
4. Utils Layer (utils/)¶
Shared utilities and configuration.
config.py: Configuration loaderconfigurator.py: First-run interactive setup (when no config resolves) and the/config(full reconfigure) and/model(override one model section) chat commandspaths.py: Central path helper β single source of truth for the app home (~/.mnemoai, override with$MNEMOAI_HOME) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)config.yaml.example: Configuration template (copy toconfig.yamland add your settings;.bedrockand.bedrock.mantlevariants also provided)bm25.py: Lightweight BM25 implementation for hybrid (semantic + keyword) searchlogger.py: Logging utilities (stderr output)formatting/: Text formattingcode_formatter.py: Code syntax highlightingurl_formatter.py: URL highlightingresponse_parser.py: Response processing
Data Flow¶
- User Input β
ChatInterfaceβLangGraphClient - Client β Invokes LangGraph agent with MCP tools
- Classifier β Routes query to a category (simpleqa, code, research, knowledge, full) (_if routing enabled)
- Orchestrator β For
fulltasks: decomposes into subtasks, spawns workers, aggregates results (if orchestration enabled) - LangGraph β Executes agent node with route-specific tools, decides to use tools
- MCP Server β Executes tool (e.g., fs_read, web_search, RAG)
- Tool Result β Returned to agent via tools node
- LangGraph β Continues agent loop until response complete
- Response β Displayed to user via
ChatInterface
Session Management¶
Each chat session has a unique ID used for:
- RAG document indexing (session-scoped)
- Chunk caching for file summarization
Session data is stored in ~/.mnemoai/{profile_name}/:
~/.mnemoai/
βββ {profile_name}/
βββ conversations/ # Saved conversations
βββ profiles/ # User profiles
βββ todos/ # Todo list data
βββ rag_session_id.txt # Current RAG session
βββ rag_store_*.faiss # FAISS vector index (or ChromaDB directory)
βββ chunk_cache_*.db # SQLite chunk cache
βββ models/ # Per-model memory (isolated by chat model)
βββ {sanitized_model}/ # e.g. global.anthropic.claude-fable-5
βββ episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
βββ playbook/ # ACE playbook strategies and metrics
Model-scoped memory: episodic memory and the playbook live under
models/{model}/so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.
Context Compaction¶
To keep long conversations within the model's context window, the assistant compacts history by summarizing it:
- Automatic β after a turn pushes the conversation past
MAX_CONVERSATION_TOKENS, older messages are summarized into the system prompt while the most recentLLM.KEEP_RECENT_MESSAGESturns are kept verbatim. - Manual β run
/compactany time (optionally/compact <focus instructions>to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (LLM.MANUAL_COMPACT_KEEP_RECENT).
The kept-verbatim window is bounded by both a message count and a token budget (LLM.KEEP_RECENT_TOKEN_BUDGET, default 25% of MAX_CONVERSATION_TOKENS). Walking newestβoldest, a message that would exceed the budget is summarized instead of kept β so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.
The summary preserves topics, decisions, and tool calls/results (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.
For the full per-file reference, see Architecture Reference.