Semantic Search MCP — Nikhil Kumar

Keyword search finds what you named. Semantic search finds what you meant. This MCP server indexes an entire codebase with local embeddings and lets AI assistants query it by concept — "where does authentication happen" returns relevant code even if no file contains the word "auth."

The Problem Space

AI coding assistants are limited by the context they can see. Keyword search misses conceptually related code when naming conventions differ. Sending entire codebases to cloud APIs raises privacy concerns and hits token limits. Developers need a way to search their code by meaning, locally, without external API calls.

Engineering the Solution

I built the retrieval system in layers: file discovery with gitignore-aware filtering, Tree-sitter AST chunking with regex fallback, local Hugging Face embeddings (no cloud API), WAL-backed MessagePack cache for crash recovery, and hybrid search combining BM25 lexical scoring with dense vector similarity. The MCP protocol exposes this as tools that any compatible AI assistant can call over stdio.

A Node.js MCP server with five registered tools: smart_search (hybrid semantic + lexical), symbol_search (regex + symbol index), reindex_codebase, clear_index_cache, and search_stats. The indexing pipeline uses staged processing — mtime-based change detection, AST-aware chunking, batched embedding with CPU/GPU paths, and incremental file watching through @parcel/watcher. The vector store enforces memory caps with LRU eviction and the WAL manager provides crash recovery with CRC32 integrity checks.

Impact & Outcomes

A working local semantic search infrastructure with integration tests covering search flow, indexing workflow, MCP tool registration, and embedding pipeline behavior. The system indexes codebases incrementally, persists vectors across restarts, and serves hybrid search results through the MCP protocol without any external API calls.

Reflections & Takeaways

Key observations from building this system:

Chunk boundaries matter more than embedding model choice for search quality. Function-level AST chunks consistently outperform fixed-window chunking.
WAL-based persistence is worth the complexity. Embedding an entire codebase takes minutes — losing that work on a process crash is unacceptable.
Hybrid search (BM25 + vector) with reciprocal rank fusion catches both exact identifier matches and conceptual similarity. Neither approach alone is sufficient.

In hindsight: I would add a dedicated MCP tool for indexing status so assistants can know whether the index is ready before attempting searches.

Next iteration: Adding cross-encoder reranking for the top results and exploring graph-based retrieval that follows import chains.