Pluggable Site RAG Agent
One-line embeddable chat widget backed by a self-hostable RAG service grounded in your own docs.
Why it mattersLets any business drop a grounded AI assistant onto their site without integration work - the same pattern that powers customer support, internal helpdesks, and product onboarding.
What it does
A self-hostable AI support agent that drops onto any website with a single <script> tag. The visitor gets a floating chat widget; the site owner gets answers grounded in their own documents - with citations, conversation memory, and per-session rate limiting built in.
Where it applies
- Any business that wants a knowledge-grounded chat assistant on their site without committing to a SaaS vendor or rebuilding their frontend.
- Internal helpdesks and product onboarding flows - swap customer documents for internal runbooks and the same pipeline works.
- Static or framework-built pages where dropping an
<script>tag is the only integration the host can stomach.
How it works (high level)
A FastAPI backend exposes a single chat endpoint plus a self-installing widget script. Each turn rewrites the user question into a standalone form using prior turns, runs MMR retrieval (k=5) over a Qdrant vector store, and answers via a stuff-docs chain over Gemini. Sessions, token-bounded conversation memory, and slowapi rate limits are layered in via FastAPI middleware. Every chat turn is recorded in LangSmith for offline review.
Outcome
A site owner runs the backend once, drops the widget script into any page, and has a working AI assistant within minutes. The widget mounts a floating iframe with a chat UI; nothing about the host page's frontend has to change.
Stack
Python · FastAPI · LangChain · Qdrant · Gemini embeddings/chat · LangSmith · vanilla-JS widget.