Skip to content

Architecture

πŸ—οΈ Architecture

High-Level Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         main.py                             β”‚
β”‚                    (Application Entry)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                               β”‚
              β–Ό                               β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ LangGraphClient │◄──────────►│  MCP Server      β”‚
      β”‚  (client.py)    β”‚            β”‚  (server.py)     β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                              β”‚
          β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”                        β–Ό
          β”‚          β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό          β–Ό                   β”‚  Tools   β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
      β”‚  UI    β”‚ β”‚ Managers β”‚                 β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
          β”‚          β”‚                   β”‚         β”‚
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                   β–Ό         β–Ό
               β–Ό                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚ Readers  β”‚ β”‚ RAG β”‚
          β”‚LangGraph β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜
          β”‚  Agent   β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Breakdown

1. Client Layer (client/)

The client manages the conversation flow and user interaction.

  • client.py: Core LangGraph client
  • Initializes MCP connection
  • Manages conversation state
  • Handles model configuration
  • Coordinates managers (profile, conversation)
  • agent.py: LangGraph agent implementation
  • State graph with agent and tools nodes
  • Streaming support with reasoning display
  • Code syntax highlighting
  • router.py: Query classifier and routing
  • Classifies queries into categories (simple_qa, code, research, knowledge, full)
  • Routes each category to a specialized tool subset
  • Configurable classifier prompt via ROUTING_PROMPT in config
  • orchestrator.py: Task decomposition and worker orchestration
  • Decomposes complex tasks into ordered subtasks with category assignments
  • Configurable orchestrator and aggregator prompts via config
  • reasoning_utils.py: Shared reasoning/thinking helpers
  • Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
  • Extracts visible text from <think> tags and Bedrock thinking blocks
  • mcp_tool_wrapper.py: MCP to LangChain adapter
  • Wraps MCP tools as LangChain BaseTool
  • Handles async/sync conversion
  • ui/: User interface components
  • chat_interface.py: Interactive chat loop with command handling
  • spinner.py: Loading animations
  • managers/: Business logic
  • agent_conversation_manager.py: Conversation state and token tracking
  • user_profile_manager.py: Automatic user profiling and learning

2. Server Layer (server/)

MCP server that provides tools to the LLM.

  • server.py: FastMCP server initialization
  • error_handler.py: @tool_error_handler decorator (shared by all tools)
  • tools/: Tool implementations
  • tools_manager.py: Centralized tool registration and utilities
  • fs_read.py: File reading (text, CSV, JSON, PDF, DOCX)
  • fs_write.py: File writing (dry-run preview); writes are hard-gated client-side by REQUIRE_WRITE_CONFIRMATION
  • file_edit.py: Precise string replacement with validation and uniqueness checking
  • execute_bash.py: Shell command execution with intelligent error handling
  • file_search.py: Fast file/content search (glob patterns + ripgrep)
  • todo_manager.py: Todo list management for multi-step tasks
  • web_search.py: Brave Search integration
  • web_crawler.py: Web page content extraction with RAG integration
  • describe_image.py: Vision model image analysis
  • rag_tool.py: RAG tools registration
  • rag/: RAG system
    • session.py: Session-scoped RAG management with hybrid search
    • vector_store_controller.py: Vector store abstraction layer
    • faiss_store.py: FAISS vector store implementation
    • chroma_store.py: ChromaDB vector store implementation
  • readers/: Specialized file readers
    • line_reader.py, directory_reader.py, search_reader.py
    • csv_reader.py, json_reader.py
    • pdf_reader.py, docx_reader.py
    • chunking_helper.py: Document chunking for RAG

3. Models Layer (models/)

Model controllers and custom implementations.

  • provider_params.py: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it via build_kwargs, and /model prunes unsupported keys from it
  • mantle_factory.py: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllers
  • controllers/ (provider-dispatching model initialization):
  • base_model_controller.py: Minimal shared base type for the controllers
  • llm_controller.py: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, Anthropic, SageMaker AI, LiteLLM)
  • vision_model_controller.py: Vision model initialization
  • embeddings_controller.py: Embedding model initialization for RAG
  • chat_models/ (concrete LangChain ChatModel subclasses):
  • chat_ollama_wrapper.py: Extends ChatOllama with presence_penalty and frequency_penalty support
  • sagemaker_chat.py: Full LangChain BaseChatModel for SageMaker endpoints (streaming, tool calling, reasoning)

4. Utils Layer (utils/)

Shared utilities and configuration.

  • config.py: Configuration loader
  • configurator.py: First-run interactive setup (when no config resolves) and the /config (full reconfigure) and /model (override one model section) chat commands
  • paths.py: Central path helper β€” single source of truth for the app home (~/.mnemoai, override with $MNEMOAI_HOME) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)
  • config.yaml.example: Configuration template (copy to config.yaml and add your settings; .bedrock and .bedrock.mantle variants also provided)
  • bm25.py: Lightweight BM25 implementation for hybrid (semantic + keyword) search
  • logger.py: Logging utilities (stderr output)
  • formatting/: Text formatting
  • code_formatter.py: Code syntax highlighting
  • url_formatter.py: URL highlighting
  • response_parser.py: Response processing

Data Flow

  1. User Input β†’ ChatInterface β†’ LangGraphClient
  2. Client β†’ Invokes LangGraph agent with MCP tools
  3. Classifier β†’ Routes query to a category (simpleqa, code, research, knowledge, full) (_if routing enabled)
  4. Orchestrator β†’ For full tasks: decomposes into subtasks, spawns workers, aggregates results (if orchestration enabled)
  5. LangGraph β†’ Executes agent node with route-specific tools, decides to use tools
  6. MCP Server β†’ Executes tool (e.g., fs_read, web_search, RAG)
  7. Tool Result β†’ Returned to agent via tools node
  8. LangGraph β†’ Continues agent loop until response complete
  9. Response β†’ Displayed to user via ChatInterface

Session Management

Each chat session has a unique ID used for:

  • RAG document indexing (session-scoped)
  • Chunk caching for file summarization

Session data is stored in ~/.mnemoai/{profile_name}/:

~/.mnemoai/
└── {profile_name}/
    β”œβ”€β”€ conversations/           # Saved conversations
    β”œβ”€β”€ profiles/                # User profiles
    β”œβ”€β”€ todos/                   # Todo list data
    β”œβ”€β”€ rag_session_id.txt       # Current RAG session
    β”œβ”€β”€ rag_store_*.faiss        # FAISS vector index (or ChromaDB directory)
    β”œβ”€β”€ chunk_cache_*.db         # SQLite chunk cache
    └── models/                  # Per-model memory (isolated by chat model)
        └── {sanitized_model}/   # e.g. global.anthropic.claude-fable-5
            β”œβ”€β”€ episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
            └── playbook/        # ACE playbook strategies and metrics

Model-scoped memory: episodic memory and the playbook live under models/{model}/ so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.

Context Compaction

To keep long conversations within the model's context window, the assistant compacts history by summarizing it:

  • Automatic β€” after a turn pushes the conversation past MAX_CONVERSATION_TOKENS, older messages are summarized into the system prompt while the most recent LLM.KEEP_RECENT_MESSAGES turns are kept verbatim.
  • Manual β€” run /compact any time (optionally /compact <focus instructions> to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (LLM.MANUAL_COMPACT_KEEP_RECENT).

The kept-verbatim window is bounded by both a message count and a token budget (LLM.KEEP_RECENT_TOKEN_BUDGET, default 25% of MAX_CONVERSATION_TOKENS). Walking newest→oldest, a message that would exceed the budget is summarized instead of kept — so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.

The summary preserves topics, decisions, and tool calls/results (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.

For the full per-file reference, see Architecture Reference.