AI Systems Engineer

Nikhil Kumar

I engineer intelligent systems. From local semantic search engines to full-stack streaming chat architectures, I build the infrastructure, APIs, and interfaces that take AI models from research to production.

200+ Chrome Web Store users

20+ public repos

See the work ↓ All projects

GitHub github.com/NICxKMS LinkedIn linkedin.com/in/nicx Email knikhil9231@gmail.com

Currently building

A semantic code search MCP server with local embeddings, hybrid retrieval, and WAL-backed persistence.

Engineering Atlas

Selected Case Studies

A closer look at the architecture, trade-offs, and engineering outcomes of key systems I've built. For smaller experiments, visit my GitHub.

Semantic Search MCP

2025

A local MCP server that indexes codebases into embeddings and exposes hybrid search to AI coding assistants

MCP ProtocolLocal EmbeddingsHybrid Search

Lia

2025

A full-stack AI chat system with streaming responses, provider switching, and real-time sentiment analysis

FastAPIReact 19SSE Streaming

Unified Chat API

2025–2026

A provider-agnostic LLM gateway unifying five AI providers behind a single Next.js API layer

Next.js 15Multi-LLMServer Actions

See more projects → Browse GitHub ↗

Technical Arsenal

Engineering Stack

End-to-end AI systems

I've taken models from research notebooks to production inference serving real users. That means the full path: PyTorch and Transformers for the model layer, FastAPI for serving, Cloudflare Workers for edge deployment, and the provider orchestration to make multi-LLM switching invisible to users.

PyTorch ◆ TensorFlow ◆ Hugging Face Transformers ○ OpenCV ○ Scikit-Learn ○ YOLOv8 ○

Full-stack product engineering

When a problem needs a product rather than a pipeline, I build that too. TypeScript, Next.js, React, databases, auth, streaming — the frontend is not a handoff concern, it's part of the system I own. I've shipped browser extensions, real-time dashboards, and multi-provider chat interfaces.

TypeScript ✓ React ✓ Next.js ✓ Node.js ○ Astro ○ HTML/CSS ✓

Backend architecture & APIs

FastAPI for async Python services, Node.js for JavaScript runtimes, Cloudflare Workers for edge compute. JWT auth, SSE streaming, rate limiting, provider-agnostic API layers — the boring infrastructure that makes products feel reliable.

FastAPI ✓ Python ◆ Cloudflare Workers ✓ PostgreSQL ○ Redis ○ Docker ○

AI integration & applied ML

Multi-provider LLM orchestration across OpenAI, Anthropic, Gemini, and xAI. Prompt engineering, sentiment analysis pipelines, semantic code search with local embeddings, and the MCP protocol infrastructure connecting AI to developer tools.

LLM Integration ✓ Prompt Engineering ◆ Vercel AI SDK ○ Sentiment Analysis ✓ MCP Protocol ○ Semantic Search ○

✓ shipped to production ◆ deep experience ○ fluent / working

Track record

Experience

B.E. in Computer Science

Independent AI Systems Engineer 2024–Present

Built and shipped end-to-end AI products across browser extensions, geospatial dashboards, multi-provider chat systems, and developer tooling — each solving a concrete user problem with real deployment infrastructure.

Full ownership across model integration, backend architecture, streaming protocols, and deployment to Cloudflare, Vercel, and Chrome Web Store.

About

The person behind the work

I started building because I got frustrated by tools that almost worked. A Coursera quiz that took four hours of video to answer. A forest monitoring dashboard that required five separate browser tabs. A chatbot that couldn't tell you whether the user was happy or confused. Each frustration became a shipped product.

The thread connecting everything is the distance between a working model and a working product. Taking an LLM integration from "it generates text" to "it streams tokens through SSE, persists conversation state, switches providers without downtime, and scores sentiment in parallel" — that's where I spend my time.

I don't hand off. I build the model integration, the API layer, the streaming protocol, the React frontend, and the deployment pipeline. Not because I want to do everything, but because the seams between layers are where most products break. A backend engineer who doesn't understand streaming UX builds a system that feels laggy. A frontend developer who doesn't understand provider abstraction builds an interface that breaks when you switch from Gemini to OpenAI. I close those gaps.

The model is never the hardest part.

Getting GPT-5.5 to generate good text takes an afternoon. Getting that text to stream reliably through SSE, switch providers without dropping a session, and degrade gracefully when the API returns a 429 — that takes weeks. The engineering that matters lives in the infrastructure around the model.

Honest systems label their uncertainty.

My forest monitoring dashboard has a "no-mock mode" — a toggle that refuses to display any data that isn't verified live. When an API is down, the user sees an honest error, not a plausible lie. This principle extends to everything I build. I don't claim numbers I can't source.

Right now I'm deep in the infrastructure connecting AI to developer tools — semantic code search with local embeddings, MCP protocol servers, and retrieval systems that work without sending your code to a cloud API. I'm looking for teams that ship AI products to real users, where the work matters past the demo stage. I hold a B.E. in Computer Science.

Let's talk

Get in touch

I'm looking for AI engineering roles where I own the full stack from model integration to deployment. If you're building something where the engineering problems are genuinely hard — I want to hear about it.

knikhil9231@gmail.com