AI Agent · Enterprise Knowledge
Enterprise AI Chat
A private, citation-backed RAG assistant that answers only from each tenant's own documents.
Overview
Enterprise AI Chat is a multi-tenant RAG (retrieval-augmented generation) document assistant. Each workspace uploads its own documents, PDF, Word, Excel, CSV, and asks natural-language questions that are answered strictly from that workspace's content, with source citations. One shared chat model serves every tenant while answers stay scoped to each tenant's private knowledge base, with no model training or fine-tuning required.
The challenge
Organizations sit on large piles of internal documents that are hard to search and easy to misquote. Generic chatbots hallucinate and have no access to private knowledge, and many organizations cannot send proprietary documents to third-party clouds at all. The product needed to give teams a private, citation-backed assistant that answers only from their own files, with strict per-tenant data isolation.
What we built
- A RAG pipeline orchestrated with LlamaIndex: documents are chunked with a sentence splitter (512-token chunks, 64-token overlap), embedded, and stored as vectors; queries retrieve the top-k most similar chunks and feed them to the LLM.
- Strict multi-tenant isolation in Chroma, where each tenant gets its own collection so there is no cross-workspace data leakage.
- Local embeddings via fastembed running BAAI/bge-small-en-v1.5, no API key and no cloud calls, and an Ollama-served local LLM (default llama3.1) for fully on-prem inference, both swappable to hosted providers (OpenRouter) through a provider factory with zero code changes.
- A carefully engineered grounding prompt that forces answers to use only retrieved context, refuse when the answer is not in the documents, cite source filenames, and stream responses as clean Markdown.
- A storage split where MongoDB holds tenant metadata, document records, and chat history, raw files live on per-tenant disk, and Chroma holds the vectors, served behind a FastAPI backend with a Next.js 16 / React 19 frontend.
Results
Enterprise AI Chat delivers faster answers from internal documents with traceable citations instead of manual search, and constrains responses to retrieved content to keep hallucination risk low. Because it can run fully offline with local embeddings and a local LLM, it offers a strong data-privacy posture for sensitive sectors such as legal, finance, and healthcare. A single shared model serves all tenants without per-tenant fine-tuning, and teams can swap between local and hosted models without touching code.


