Architecture — Detailed File Map¶

This is the full per-file reference for the codebase, split out of CLAUDE.md to keep that file lean. See CLAUDE.md for the high-level architecture, key patterns, configuration, and conventions.

Paths below are relative to the package root src/mnemoai/ (e.g. client/agent/agent.py is src/mnemoai/client/agent/agent.py).

Detailed File Map¶

Entry Point¶

File	Purpose
`main.py`	CLI entry point (`cli()`; also `python -m mnemoai`). Parses `--no-verbose`, creates `LangGraphClient(verbose=…)`, starts it, creates `ChatInterface`, runs chat loop.

`client/` — Agent, Routing, Orchestration, UI¶

File	Purpose
`client/__init__.py`	Exports `LangGraphClient`
`client/client.py`	Central orchestration class. Manages full lifecycle: MCP connection, LLM init, agent creation, query processing, episodic memory injection, playbook context injection, conversation save/load, RAG session, chunk cache, context clearing. Key methods: `start()`, `query()`, `clear_context()`, `save_conversation()`, `load_conversation()`, `reflect_and_learn()`, `_inject_episodic_context()`, `_inject_playbook_context()`, `_inject_memory_context()` (injects the curated `MEMORY.md` into the system prompt at session start). Holds `plan_mode_active` (toggled by `/plan`), passed to the agent as a `plan_mode_provider` callback; `query()` prepends a read-only banner each turn while it's on.
`client/agent/agent.py`	LangGraph StateGraph agent. Builds graph with nodes: `classifier`, `orchestrator`, `agent` (call_model), `tools` (execute_tools). Handles streaming output with `CodeFormatter`, thinking/reasoning extraction, retry on empty content, and prints a gray `[⚙ tool(args)]` marker per tool call (`_format_tool_call()`). Key class: `LangGraphAgent` with `invoke()`, `_build_graph()`, `_call_model()`, `_execute_tools()`, `_stream_response()`, `_classify()`, `_orchestrate()`, `_run_worker_loop()`, `_aggregate_results()`. External (mcp.json) tools are tracked in `self.external_tools`, appended to every non-empty route, and surfaced to the decomposer via `_external_tools_prompt_block()` (routes such subtasks to `full`). `_confirm_tool()` hard-gates destructive tools client-side (`execute_bash`, `fs_write`/`file_edit`) and also the `memory` tool when `REQUIRE_MEMORY_CONFIRMATION` is true (default false). Enforced plan mode: `_is_blocked_by_plan_mode()` (driven by a `plan_mode_provider` callback wired to `client.plan_mode_active`) hard-blocks `_PLAN_BLOCKED_TOOLS` (execute_bash, fs_write, file_edit, git_safe, git_commit_safe, start_background_task) at both chokepoints, above `_confirm_tool`.
`client/mcp_tool_wrapper.py`	MCP↔LangChain bridge. Runs background asyncio event loop in daemon thread for persistent MCP server connection. `MCPClientWrapper` manages one server's lifecycle; `MCPToolWrapper(BaseTool)` wraps individual MCP tools with Pydantic arg schemas (calls the server by `mcp_tool.name`, so namespaced display names still route correctly). `MultiMCPClient` owns the built-in wrapper plus one per external server, connects/exits them together, and merges their tools — namespacing a colliding external tool as `servername__tool` (built-in names win). Sync wrappers use `run_coroutine_threadsafe`.
`client/mcp_config.py`	External MCP server loader. `load_external_servers()` reads `~/.mnemoai/mcp/mcp.json` (Claude Code/kiro `mcpServers` schema; legacy flat `~/.mnemoai/mcp.json` fallback) into `ExternalServer(name, StdioServerParameters)` tuples via `_parse_entry()`. Tolerant: a missing/malformed file or a bad/`disabled` entry is skipped (red-logged), never crashing startup. `env` is merged over the process environment.
`client/agent/router.py`	Query classifier. `QueryRouter.classify()` sends query + `ROUTING_PROMPT` to LLM, returns one of: `simple_qa`, `code`, `research`, `knowledge`, `full`. `ROUTE_TOOLS` dict maps each route to allowed tool names.
`client/agent/orchestrator.py`	Task decomposition. `get_orchestrator_prompt()` and `get_aggregator_prompt()` load prompts from prompts.yaml (`config.prompt(...)`). `parse_subtasks(content, fallback_query, valid_categories)` robustly parses JSON subtask list from LLM output with multiple fallback strategies.
`client/agent/reasoning_utils.py`	Shared reasoning-model helpers. `disable_reasoning(model)` / `restore_reasoning(model, saved)` temporarily turn off thinking for auxiliary LLM calls (router classification, task decomposition) so output lands in `response.content` instead of the reasoning field. `extract_visible_text(content)` strips `<think>` tags / Bedrock thinking blocks. Used by `agent.py` and `router.py`.

`client/managers/` — Conversation & Profile Management¶

File Purpose

client/managers/agent_conversation_manager.py Conversation compaction (token counting + LLM summarization). Auto-compacts when over MAX_CONVERSATION_TOKENS; compact() is the manual /compact path. Summarizes older messages into the system prompt while keeping recent turns verbatim — the kept window is bounded by BOTH message count (KEEP_RECENT_MESSAGES / MANUAL_COMPACT_KEEP_RECENT) and a token budget (KEEP_RECENT_TOKEN_BUDGET, default 25% of max), so an oversized recent message is summarized rather than kept. Preserves tool calls/results in the summary, which uses a structured prompt (summarizer system framing + 9-section <analysis>-then-summary template; /compact <focus> injected as compact instructions; <analysis> stripped; continuation instruction added). The split is tool-pair-safe (_safe_tool_boundary) so the kept window never starts on an orphaned tool result (which the OpenAI Responses API rejects). Key methods: count_tokens(), generate_summary(), _build_summary_prompt(), _strip_analysis(), manage_messages(), compact(), _compact(), _split_keep_recent(), _safe_tool_boundary().

client/managers/user_profile_manager.py Learns user preferences via Exponential Moving Average (EMA). Tracks: verbosity, directness, technical level, abstraction preference, top domains, tool success per intent. Generates compact profile summary for system prompt injection. Persists as JSON at ~/.mnemoai/{profile}/. Key class: UserProfileManager with analyze_conversation(), classify_intent(), get_profile_summary().

`client/memory/` — Episodic Memory & Learning¶

File	Purpose
`client/memory/episodic_memory.py`	High-level episodic memory manager. Stores successful task patterns with tool usage. Supports query expansion (synonyms). Delegates to ChromaDB or FAISS store. Key functions: `is_task_successful()` (heuristic checks success/correction/error markers), `extract_tools_from_messages()`. Key class: `EpisodicMemoryManager` with `store()`, `retrieve()`, `clear()`.
`client/memory/memory_store.py`	Curated persistent memory store. `MemoryStore` with read/add/replace/remove over a `MEMORY.md` whose entries are separated by a Markdown `---` rule (legacy `§` files still parse + migrate); char-capped (consolidate-on-overflow). Shared by the MCP memory tool and the `/memory` command.
`client/memory/chroma_store.py`	ChromaDB-backed episodic store with hybrid search (semantic from ChromaDB + BM25 re-ranking). Key class: `ChromaEpisodicStore` with `add()`, `search()`, `cleanup()`.
`client/memory/faiss_store.py`	FAISS-backed episodic store (alternative to ChromaDB). Same hybrid search pattern. Key class: `FAISSEpisodicStore` with `add()`, `search()`, `cleanup()`, `clear()`.
`client/memory/reflector.py`	ACE Reflector. Analyzes tool execution trajectories after each interaction. Detects failures (string_not_found, file_not_found, permission_denied, syntax_error, timeout, api_error). Extracts reusable strategies as `PlaybookEntry` objects. Tracks metrics (total/successful/failed calls, failure types, daily stats). Key class: `Reflector` with `analyze_tool_execution()`, `reflect_on_trajectory()`.
`client/memory/playbook_store.py`	ACE Playbook. Append-only store for learned strategies with lazy semantic deduplication. Retrieves relevant entries by task context for system prompt injection. Key class: `PlaybookStore` with `append()`, `append_batch()`, `get_relevant_entries()`, `format_for_prompt()`, `_refine()` (triggers when over max_entries). Persists at `~/.mnemoai/{profile}/models/{model}/playbook/playbook.json` (model-scoped).

`client/ui/` — User Interface¶

File	Purpose
`client/ui/__init__.py`	Package marker
`client/ui/chat_interface.py`	Interactive CLI using prompt_toolkit. Multiline input (Ctrl+J), slash-command autocomplete, commands: `/clear`, `/load`, `/save`, `/exit`, `/quit`, `/compact [focus]` (manual context compaction), `/config` (re-run the configurator via `run_reconfigure()`; overwrites config.yaml), `/model` (override one model section — LLM/vision/embeddings — via `run_model_override()`), `/params` (tune a model's inference params via `run_params_override()`), `/mcp` (list configured MCP servers + tool counts via `_print_mcp_status()`), `/memory` (view the curated `MEMORY.md`; `/memory clear` wipes it with a y/N confirm, via `_handle_memory_command()`), `/plan` (toggle enforced read-only plan mode — flips `client.plan_mode_active`, which the agent reads to hard-block mutating/exec tools). `/config`, `/model`, `/params` call `_restart_in_place()`, which re-execs the process via `os.execv` so all settings — incl. MCP tool toggles decided at subprocess boot — take effect. Handles episodic memory storage (immediate and delayed modes), ACE reflection triggers, double Ctrl+C exit. Key class: `ChatInterface` with `run_chat_loop()`, `get_multiline_input()`.
`client/ui/spinner.py`	Threaded "Thinking..." spinner animation shown during LLM processing. Key class: `Spinner` with `start()`, `stop()`.

`server/` — MCP Server & Tools¶

File	Purpose
`server/__init__.py`	Package marker
`server/server.py`	MCP server entry point (run as subprocess). Creates `FastMCP("MCP Server")`, calls `register_tools(mcp)`, runs via stdio transport.
`server/error_handler.py`	`@tool_error_handler` decorator (shared by all tools). Catches typed exceptions (FileNotFoundError, PermissionError, etc.) and returns structured JSON with `error_type`, `message`, `next_steps`.

`server/tools/` — Tool Implementations¶

File	Purpose
`server/tools/__init__.py`	Creates global `ToolManager` singleton. Exports `register_tools`, `validate_file_path`, `count_tokens`, `vision_model`, `vision_model_controller`.
`server/tools/tools_manager.py`	Central tool management. Initializes vision model, provides token counting, file path validation. `register_tools(mcp)` conditionally registers all tool categories based on config toggles.
`server/tools/execute_bash.py`	`execute_bash(command, timeout=30)` — safe shell execution with timeout. Blocks dangerous commands (rm, mkfs, dd, shutdown, chmod 777). Returns stdout/stderr/exit_status.
`server/tools/file_edit.py`	`file_edit(file_path, old_string, new_string, replace_all=False)` — precise string replacement. Validates file exists, checks for unique matches.
`server/tools/file_search.py`	`glob_search(pattern, path, max_results, sort_by_mtime)` — file name search. `grep_search(pattern, path, file_pattern, case_insensitive, output_mode, context_lines, max_results)` — content search via ripgrep.
`server/tools/fs_read.py`	`fs_read(path, mode, start_line, end_line, pattern, context_lines, depth)` — multi-mode reader. Modes: Line, Search, Directory, CSV, JSON, JSONL, PDF, DOCX. Delegates to readers/.
`server/tools/fs_write.py`	`fs_write(path, command, ...)` — two-step write with confirmation (dry_run=True for preview, then dry_run=False + confirmed=True). Commands: create, str_replace, insert, append.
`server/tools/git_safety.py`	`git_safe(command, allow_dangerous, reason)`, `git_status_safe()`, `git_commit_safe(message, add_all, add_files, amend, allow_empty)` — git with safety checks. Blocks force push to main/master, warns about hard resets.
`server/tools/describe_image.py`	`describe_image(image_path, question)` — sends base64 image to vision model for description.
`server/tools/plan_mode.py`	Multi-step planning workflow. Tools: `enter_plan_mode()`, `add_plan_step()`, `add_plan_file()`, `add_plan_risk()`, `present_plan()`, `approve_plan()`, `exit_plan_mode()`, `get_plan_status()`. Persists at `~/.mnemoai/plans/current_plan.json`.
`server/tools/background_tasks.py`	Background task execution in threads. Tools: `start_background_task()`, `get_task_status()`, `get_task_output()`, `list_background_tasks()`, `cancel_background_task()`, `wait_for_task()`, `clear_completed_tasks()`. Output at `~/.mnemoai/tasks/`.
`server/tools/todo_manager.py`	Task tracking. Enforces one in_progress task at a time. Tools: `todo_write(todos)`, `todo_read()`, `todo_clear()`. Persists at `~/.mnemoai/{profile}/todos/current_todos.json`.
`server/tools/web_crawler.py`	`web_crawler(url)` — extracts page content as markdown via crawl4ai. Optionally ingests large pages into RAG store.
`server/tools/web_search.py`	`web_search(query, search_lang, num_results)` — internet search via Brave Search API. Returns structured results.
`server/tools/rag_tool.py`	MCP-exposed RAG tools: `list_documents()`, `search_in_documents(query, top_k)`, `clear_documents()`.
`server/tools/memory_tool.py`	`register_memory_tools(mcp)` exposing the `memory(action, text, old_text)` tool; delegates to `MemoryStore`. Gated by `ENABLE_MEMORY`.

`server/tools/rag/` — RAG Engine¶

File	Purpose
`server/tools/rag/__init__.py`	Exports `get_rag_session`, `reset_session_rag`, `SessionRAG`, `FaissStore`, `create_store`, `register_rag_tools`.
`server/tools/rag/session.py`	Core RAG engine. Session-scoped vector store with hybrid search (semantic + BM25). `SessionRAG` class with `ingest(doc_id, content, chunk_size_tokens)` and `query(query_text, top_k)`. Cross-process session sharing via file-based session_id. Functions: `get_rag_session()`, `set_rag_session()`, `reset_session_rag()`.
`server/tools/rag/vector_store_controller.py`	Abstraction over FAISS/ChromaDB backends. Factory pattern via `VectorStoreController` with `add()`, `search()`, `clear()`, `detect_existing_store()` (static).
`server/tools/rag/faiss_store.py`	FAISS IndexFlatIP (cosine similarity on L2-normalized vectors). Thread-safe with `threading.Lock()`. File persistence (faiss index + pickle metadata). Key class: `FaissStore`.
`server/tools/rag/chroma_store.py`	ChromaDB-backed store alternative. Persistent client with automatic collection management. Key class: `ChromaStore`.

`server/tools/readers/` — File Format Readers¶

File	Purpose
`server/tools/readers/__init__.py`	Exports all readers
`server/tools/readers/chunking_helper.py`	Universal chunking + LLM summarization for large files. Recursive splitting with 10% overlap. SQLite chunk cache. Concurrent summarization with asyncio semaphore. Key functions: `process_large_content()`, `reset_session_chunk_cache()`.
`server/tools/readers/line_reader.py`	`read_lines(path, start_line, end_line)` — line-based file reading with token limit.
`server/tools/readers/directory_reader.py`	`read_directory(path, depth)` — recursive directory listing.
`server/tools/readers/csv_reader.py`	`read_csv(path)` — CSV with auto-delimiter detection, encoding fallbacks, token truncation.
`server/tools/readers/json_reader.py`	`read_json(path, start_line, end_line)` — JSON/JSONL reading with line ranges and token limits.
`server/tools/readers/pdf_reader.py`	`read_pdf(file_path)` — PyPDF2 reader. Large PDFs auto-ingest into RAG if enabled, else chunk+summarize.
`server/tools/readers/docx_reader.py`	`read_docx(file_path)` — python-docx reader. Same RAG/chunking fallback as PDF.
`server/tools/readers/search_reader.py`	`search_file(path, pattern, context_lines)` — regex search within a single file with context.

`models/` — LLM Provider Abstraction¶

File	Purpose
`models/__init__.py`	Empty package marker
`models/controllers/base_model_controller.py`	Minimal shared base type for the controllers. Per-provider inference-param handling lives in `models/provider_params.py` (consumed via `build_kwargs`), not here.
`models/provider_params.py`	Single source of truth for which config keys each provider consumes, per section (`LLM_SUPPORTED_PARAMS` / `VISION_SUPPORTED_PARAMS` / `EMBED_SUPPORTED_PARAMS`), mirroring the controller init methods. `supported_params(section)` returns the per-provider key registry. The configurator uses it to prune unsupported keys on a `/model` provider switch; keep in sync when a controller starts/stops reading a key. Also exposes `extra_params(model_id)` — the generic `EXTRA_PARAMS` passthrough every provider accepts (a raw dict merged verbatim into the model's request body / `model_kwargs`); it's always in `supported_keys` (never pruned) but excluded from `tunable_params` (not a `/params` scalar).
`models/controllers/llm_controller.py`	Primary LLM controller. `LangChainLLMController(BaseModelController)` reads `MODEL_ID` config, initializes the correct LangChain chat model. Methods: `initialize_model()`, `get_model()`, `get_model_type()`. Supports: `bedrock` (ChatBedrockConverse), `mantle` (Bedrock Mantle, via `mantle_factory`), `ollama` (ChatOllamaWrapper), `openai` (ChatOpenAI), `anthropic` (direct Anthropic API via ChatAnthropic — distinct from Mantle's `anthropic` protocol), `sagemaker` (ChatSageMaker), `litellm` (ChatLiteLLM). Handles extended thinking (Bedrock & direct Anthropic Claude), Ollama reasoning, OpenAI reasoning_effort. Note: `temperature` only sent when explicitly configured (newer Claude models reject it); Anthropic `STOP` maps to `stop_sequences` and requires `max_tokens` (defaults to 4096). Optional `ENDPOINT_URL` overrides the Bedrock endpoint (Anthropic: custom base URL).
`models/mantle_factory.py`	Bedrock Mantle model factory. `build_mantle_model(model_id, ...)` returns the right LangChain model for `API_PROTOCOL`: `chat_completions` (ChatOpenAI, `/v1`), `responses` (ChatOpenAI `use_responses_api=True`, `/openai/v1`), `anthropic` (ChatAnthropic, `/anthropic`). Auth: uses a Bedrock API key when present (`MODEL_ID.API_KEY` or the `BEDROCK_API_KEY` env var), else mints a short-lived bearer token via `aws_bedrock_token_generator.provide_token()`. Used by both the LLM and vision controllers.
`models/controllers/embeddings_controller.py`	Multi-provider embeddings with LRU caching. Supports Ollama, Bedrock, OpenAI, SageMaker, LiteLLM. Falls back to SHA256-based deterministic embeddings on failure. Key class: `EmbeddingsController` with `embed(texts)` → numpy array.
`models/controllers/vision_model_controller.py`	Vision model controller. `VisionModelController(BaseModelController)` with `describe_image()`, `format_request()` (multimodal HumanMessage with base64), and `_content_to_text()` (normalizes string/list-of-blocks responses). Supports Bedrock, Mantle (all 3 protocols), Ollama, OpenAI, Anthropic (direct Claude API via `ChatAnthropic`), SageMaker (reuses `ChatSageMaker`, `openai_chat` format), LiteLLM (reuses `ChatLiteLLM`). All paths consume the same OpenAI `image_url` content from `format_request()`.
`models/chat_models/__init__.py`	Empty
`models/chat_models/chat_ollama_wrapper.py`	`ChatOllamaWrapper(ChatOllama)` — extends ChatOllama to add `presence_penalty` and `frequency_penalty` support in the options dict.
`models/chat_models/sagemaker_chat.py`	`ChatSageMaker(BaseChatModel)` — full LangChain BaseChatModel for SageMaker endpoints. Supports OpenAI chat format and HuggingFace text_generation format. Implements `_generate()` and `_stream()` (SSE parsing). Handles reasoning/thinking tags. `bind_tools()` support.

`utils/` — Shared Utilities¶

File	Purpose
`utils/__init__.py`	Package marker
`utils/paths.py`	Central path helper — single source of truth for all runtime locations. `app_home()` (defaults to `~/.mnemoai`, honors `$MNEMOAI_HOME`), `config_dir()` (→ `config/`) + `config_path()` (→ `config/config.yaml`) + `legacy_config_path()` (flat fallback), `mcp_dir()` (→ `mcp/`) + `mcp_config_path()` (→ `mcp/mcp.json`) + `legacy_mcp_config_path()` (flat fallback), `seed_example_files()` (idempotently copies bundled `config.yaml*.example`/`mcp.json.example` into `config/`+`mcp/`, never overwriting), `plans_dir()`, `tasks_dir()`, `profile_dir(profile=None)`, `model_dir(model_name, profile=None)`, `memory_file_path(profile=None)` (→ `{profile}/MEMORY.md`), `sanitize_model_name(name)`. Every call site that touches the home dir routes through here. Lazy-imports `utils.config` to avoid a cycle.
`utils/config.py`	Singleton config manager. Loads config via `_resolve_config_path()` (`$MNEMOAI_CONFIG` → `<app_home>/config/config.yaml` → legacy flat `<app_home>/config.yaml` → package `utils/config.yaml` fallback; prints a copy-a-template hint if none found). Loads prompts separately via `_resolve_prompts_path()` (`$MNEMOAI_PROMPTS` → `<app_home>/config/prompts.yaml` → package `utils/prompts.yaml`). Exposes `.get("SECTION.KEY", default)` for config, `.prompt("KEY", default)` for prompts (SYSTEM/ROUTING/ORCHESTRATOR/AGGREGATOR/SUMMARY_SYSTEM/SUMMARY_TASK), `.system_prompt` property (reads SYSTEM_PROMPT from prompts.yaml), and `reload()` (re-reads both files into the existing singleton). Prompt keys still in config.yaml are ignored with a one-time migration warning. Sets env vars from `ENV` section.
`utils/configurator.py`	First-run interactive setup. When no config resolves, `cli()` (in `main.py`) runs `run_first_run_setup()` on a TTY: picks a provider (Ollama/Bedrock/Mantle have dedicated templates; OpenAI/SageMaker/LiteLLM reuse the base template, transformed — TYPE set + unsupported keys pruned + provider connection keys prompted) and prompts for chat model + connection + optional max output tokens (`MAX_TOKENS`; `none`/blank drops it) + mandatory max context window (`MAX_CONVERSATION_TOKENS`, default 65536), vision model (mirrors chat host/region, own optional max output tokens), profile, Brave key, and each feature toggle. Patches them in via line-targeted edits (`_set_in_section`/`_set_top_level`/`_set_bool`) — reading current defaults with `_get_in_section`/`_get_top_level` — so the rich prompt blocks/comments survive. Writes `<app_home>/config/config.yaml`, then the caller calls `config.reload()`. `config_exists()` gates the trigger. `run_params_override()` (the `/params` command) tunes a configured model's inference parameters (temperature, top_p, penalties, reasoning, stop, stream, …) — only the keys the model's provider accepts, per `provider_params.tunable_params()`. `run_model_override()` (the `/model` command) edits just one model section in place — chat/vision/embeddings (embeddings offered only when configured) — using depth-agnostic helpers (`_get_field`/`_set_field`/`_remove_field`) that reach the nested `RAG.EMBED_MODEL_ID`. Both `/config` and `/model` prompt connection/auth via the SAME `_prompt_provider_connection()` helper (section-aware via the `provider_params` registry: HOST/PORT for ollama, REGION for AWS, Mantle protocol, SageMaker INPUT_FORMAT for chat/vision only, LiteLLM API_BASE/API_KEY; OpenAI is env-based), so the two flows always ask the same mandatory params. Switching a section's provider prunes every key the new provider doesn't consume — connection, auth, and inference alike (e.g. `REGION`/`API_PROTOCOL` after mantle→ollama, `HOST`/`PORT`/`TOP_K`/penalties after ollama→openai) — using the supported-key registry in `models/provider_params.py` (the single source of truth, derived from the controller init methods). Additionally, on ANY model change `/model` calls `_clear_inference_params()` to drop model-specific generation params (temperature, top_p, penalties, reasoning, stop, stream — everything in `tunable_params` except the separately-prompted `MAX_TOKENS`), so a value tuned for one model isn't carried into another that may reject it (e.g. newer Claude/GPT reject `temperature`); the new model's defaults apply until the user re-tunes via `/params`. `STOP` is kept in the example template (documentation) but never written into a generated config.
`utils/logger.py`	Logger setup. Configurable via `LOG_LEVEL` env var (default WARNING). Suppresses noisy Brave Search logs. Exports `logger` singleton.
`utils/console.py`	User-facing console output, distinct from logger diagnostics: `print_error(msg)` (red, ✗-prefixed) and `print_success(msg)` (green).
`utils/bm25.py`	Lightweight BM25 (Okapi BM25) implementation. `BM25` class with `fit(corpus)` and `score(query)`. `tokenize()` function (regex word tokenizer). Used by episodic memory and RAG hybrid search.
`utils/formatting/__init__.py`	Exports `make_urls_clickable` and everything from `response_parser`.
`utils/formatting/code_formatter.py`	Real-time streaming syntax highlighting. Handles triple-backtick code blocks (Pygments language detection) and inline code. `CodeFormatter` with `process_chunk()` and `flush()`.
`utils/formatting/response_parser.py`	Extracts structured content from AI responses. Functions: `extract_answer()` (from `<answer>` tags), `extract_thinking()` (from `<think>`/`<thinking>` tags), `format_response()`.
`utils/formatting/url_formatter.py`	Makes URLs clickable in terminal (ANSI escapes for iTerm/VSCode, fallback to color). Handles plain URLs and markdown links. Functions: `make_urls_clickable()`, `highlight_urls()`, `format_url()`.

Architecture — Detailed File Map¶

Detailed File Map¶

Entry Point¶

client/ — Agent, Routing, Orchestration, UI¶

client/managers/ — Conversation & Profile Management¶

client/memory/ — Episodic Memory & Learning¶

client/ui/ — User Interface¶

server/ — MCP Server & Tools¶

server/tools/ — Tool Implementations¶

server/tools/rag/ — RAG Engine¶

server/tools/readers/ — File Format Readers¶

models/ — LLM Provider Abstraction¶

utils/ — Shared Utilities¶