Skip to main content

AI Chatbot Pipeline Documentation


AI Chatbot Pipeline Documentation

Pipeline Overview

User Query
   ↓
Encoder (Transformer)
   ↓
Vector Search
   ↓
FAQ priority match?
   ↓
Website content match?
   ↓
Answer synthesis (RAG)
        

This is a retrieval-first, safe, and efficient pipeline (no hallucination).

Tech Stack

  • FastAPI – API
  • Sentence-Transformer – encoder
  • FAISS – vector search
  • Any LLM – for final answer synthesis (optional)
  • FAQs stored separately from website content

Project Structure

app/
 ├── main.py
 ├── embeddings.py
 ├── vector_store.py
 ├── rag.py
 ├── data/
 │    ├── faqs.json
 │    ├── website_chunks.json
        

Load Encoder (Transformer)

# embeddings.py
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

def encode(text: str):
    return model.encode(text, normalize_embeddings=True)
        

Vector Store (FAISS)

# vector_store.py
import faiss
import numpy as np

class VectorStore:
    def __init__(self, embeddings, texts):
        self.texts = texts
        self.index = faiss.IndexFlatIP(embeddings.shape[1])
        self.index.add(embeddings)

    def search(self, query_embedding, k=3):
        scores, idx = self.index.search(np.array([query_embedding]), k)
        results = []
        for i, score in zip(idx[0], scores[0]):
            results.append((self.texts[i], float(score)))
        return results
        

Load FAQ & Website Content

# main.py (setup part)
import json
import numpy as np
from embeddings import encode
from vector_store import VectorStore

faqs = json.load(open("data/faqs.json"))
web = json.load(open("data/website_chunks.json"))

faq_questions = [f["question"] for f in faqs]
faq_answers = [f["answer"] for f in faqs]

faq_embeddings = np.array([encode(q) for q in faq_questions])
web_embeddings = np.array([encode(w["text"]) for w in web])

faq_store = VectorStore(faq_embeddings, faq_answers)
web_store = VectorStore(web_embeddings, [w["text"] for w in web])
        

RAG Answer Synthesis

# rag.py
def synthesize_answer(context_chunks, query):
    context = "\n".join(context_chunks)

    prompt = f"""
Answer the question using ONLY the information below.
If the answer is not present, say "I don't have enough information."

Context:
{context}

Question:
{query}
"""
    # Call your LLM here
    return call_llm(prompt)
        

Pros and Cons of only RAG Approach

Pros:

  • Model can generate answers in a natural style.
  • Works without retrieving documents, fully self-contained.

Cons:

  • Very small dataset (50 Q&A pairs) → low coverage. Users may ask questions slightly differently.
  • Hard to update: every new FAQ requires retraining.
  • Prone to hallucinations if question is outside trained Q&A.

How RAG Changes the Game

  • You don’t need to train the model on the FAQs.
  • You store your 50 FAQs (or hundreds more) in a vector database.
  • When a user asks a question:
    1. The retriever finds the most relevant FAQ(s).
    2. The generator synthesizes the answer from the retrieved FAQ.

Benefits:

  • Model will never make up answers outside the retrieved content.
  • Easy to add new FAQs or website content without retraining.
  • Can handle paraphrased or unseen questions better.

Should You Keep the Fine-Tuned Model?

  • If your current fine-tuned transformer works well for tone/style, you can keep it as the generator in RAG.
  • But if coverage is low, RAG + a general-purpose pretrained LLM (like a base GPT or local model) is often better than fine-tuning on just 50 Q&A.

Suggested Transition

  1. Keep your FAQs in a vector database.
  2. Use your fine-tuned model (optional) for answer synthesis OR use a general LLM.
  3. Let RAG handle retrieval + synthesis:
    • FAQs get high priority.
    • Website content or docs are secondary.
  4. Optional: keep training/fine-tuning if you need a specific response style.

RAG + LLM Implementation



from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates

import json
import numpy as np
from openai import OpenAI

from embeddings import encode
from vector_store import VectorStore

# -------------------------
# App Setup
# -------------------------
app = FastAPI()
templates = Jinja2Templates(directory="templates")

client = OpenAI(api_key="YOUR_API_KEY")

# -------------------------
# Load Data
# -------------------------
faqs = json.load(open("data/faqs.json"))
web = json.load(open("data/website_chunks.json"))

faq_questions = [f["question"] for f in faqs]
faq_answers = [f["answer"] for f in faqs]

faq_embeddings = np.array([encode(q) for q in faq_questions])
web_embeddings = np.array([encode(w["text"]) for w in web])

faq_store = VectorStore(faq_embeddings, faq_answers)
web_store = VectorStore(web_embeddings, [w["text"] for w in web])

FAQ_THRESHOLD = 0.62
WEB_THRESHOLD = 0.50
HIGH_CONFIDENCE = 0.85

# -------------------------
# Simple Cache
# -------------------------
cache = {}

# -------------------------
# Keyword Routing
# -------------------------
def keyword_route(query: str):
    q = query.lower()

    if "payment" in q or "pay" in q or "paypal" in q:
        return "We accept credit cards, PayPal, and bank transfers."

    if "contact" in q or "email" in q or "support" in q:
        return "You can contact our support team at support@company.com."

    if "refund" in q or "return" in q:
        return "We offer a full refund within 30 days of purchase."

    if "track" in q or "tracking" in q:
        return "Track your order using the tracking link sent to your email."

    return None

# -------------------------
# Context Trimming
# -------------------------
MAX_CONTEXT_CHARS = 1500

def trim_context(contexts):
    trimmed = []
    total = 0

    for c in contexts:
        if total + len(c) > MAX_CONTEXT_CHARS:
            break
        trimmed.append(c)
        total += len(c)

    return trimmed

# -------------------------
# Model Selection
# -------------------------
def pick_model(query, contexts):
    if len(query.split()) <= 5:
        return "gpt-4o-mini"

    if len(contexts) > 3:
        return "gpt-4o"

    return "gpt-4o-mini"

# -------------------------
# RAG LLM Generation
# -------------------------
def generate_rag_response(query, contexts):
    contexts = trim_context(contexts)
    model = pick_model(query, contexts)

    context_text = "\n\n".join(contexts)

    prompt = f"""
You are a helpful customer support assistant.

Answer briefly (max 3 sentences).
Use ONLY the context below.
If the answer is not in the context, say "I don't know".

Context:
{context_text}

Question:
{query}

Answer:
"""

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.2,
        max_tokens=200
    )

    return response.choices[0].message.content.strip()

# -------------------------
# Routes
# -------------------------
@app.get("/", response_class=HTMLResponse)
def home(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})


@app.get("/getChatBotResponse")
def get_bot_response(msg: str):
    query = msg.strip()

    # Cache check
    if query in cache:
        return {
            "response": cache[query],
            "source": "cache",
            "confidence": 1.0
        }

    # Keyword routing (short queries)
    if len(query.split()) <= 2:
        keyword_answer = keyword_route(query)
        if keyword_answer:
            cache[query] = keyword_answer
            return {
                "response": keyword_answer,
                "source": "keyword",
                "confidence": 1.0
            }

    # Encode query
    query_embedding = encode(query)

    contexts = []
    scores = []

    # FAQ retrieval
    faq_results = faq_store.search(query_embedding, k=2)
    for ans, score in faq_results:
        if score >= FAQ_THRESHOLD:
            contexts.append(ans)
            scores.append(score)

    # High-confidence shortcut (skip LLM)
    if faq_results and faq_results[0][1] >= HIGH_CONFIDENCE:
        best_answer = faq_results[0][0]
        cache[query] = best_answer
        return {
            "response": best_answer,
            "source": "faq-direct",
            "confidence": faq_results[0][1]
        }

    # Website retrieval
    web_results = web_store.search(query_embedding, k=2)
    for text, score in web_results:
        if score >= WEB_THRESHOLD:
            contexts.append(text)
            scores.append(score)

    # No context fallback
    if not contexts:
        return {
            "response": "I don't have enough information to answer that.",
            "source": "none",
            "confidence": 0.0
        }

    # LLM generation (RAG)
    final_answer = generate_rag_response(query, contexts)

    # Save to cache
    cache[query] = final_answer

    return {
        "response": final_answer,
        "source": "rag+llm",
        "confidence": max(scores) if scores else 0.5
    } 
        

Workflow

The query flow is:

  • User sends query to /getChatBotResponse?msg=...
  • Check cache: return cached response if exists (fastest)
  • Keyword routing: handle short queries like "payment" or "refund"
  • Encode query into embeddings using encode(query)
  • Search FAQs via vector similarity
  • If high confidence in FAQ → return FAQ answer directly
  • Search website content chunks if needed
  • Combine FAQ + website content as context
  • Trim context to limit token usage
  • Select model based on query and context
  • Generate final answer via RAG + LLM
  • Cache the response for future queries
  • Return final answer with source and confidence score

Contact Us

Name

Email *

Message *

Popular Posts

UGC NET Electronic Science Previous Year Question Papers with Solutions

Home / Engineering & Other Exams / UGC NET 2026 PYQ ⬇️ Download Papers and Solutions 📋 Exam Pattern 💡 Preparation Tips ❓ FAQs 📊 Exam Highlights: Electronic Science (88) Feature Details Junior Research Fellowship (JRF) ₹37,000 + HRA per month Eligibility M.Sc/M.Tech in Electronics (55%) Validity of Certificate JRF (3 Years) | Lectureship (Lifetime) 📥 Download UGC NET Electronics PDFs Complete collection of previous year question papers, answer keys and explanations for Subject Code 88. Start Downloading 📂 View All Question Papers June 2025 - Question Paper Download PDF June 2025 - Solved Paper + Explanation ...

UGC NET Electronic Science June 2025 Question Paper with Answer Key & Detailed Solutions

Home / UGC NET PYQ / June 2025 Solved UGC NET Electronic Science June 2025 Question Paper with Answer Key and Full Explanations 📥 Download Question Paper (PDF) 2025 2024 2023 2022 2021 2020 Explanations 1.  Answer: Option (3) For forming a p-type semiconductor, the dopant must be a trivalent impurity (three valence electrons) so that it creates acceptor levels and holes become the majority carriers. Among the given elements, boron (B) is a group-III element (trivalent). Arsenic (As) and phosphorus (P) are group-V (pentavalent) donors that produce n-type material, and germanium (Ge) is a group-IV element usually used as the semiconductor, not as an acceptor dopant. Hence, doping an intrinsic semiconductor with B produces a p-type semiconductor. 2.  Answer: Option (4) The ohmic resistance of a JFET at zero gate bias is given by the standard relation: R DS(on) = V P / I DSS ...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. 📘 Theory 🧮 Simulators 💻 MATLAB Code 📚 Resources BER Definition SNR Formula BER Calculator MATLAB Comparison 📂 Explore M-ary QAM, PSK, and QPSK Topics ▼ 🧮 Constellation Simulator: M-ary QAM 🧮 Constellation Simulator: M-ary PSK 🧮 BER calculation for ASK, FSK, and PSK 🧮 Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit f...

Q-function in BER vs SNR Calculation

Q-function in BER vs. SNR Calculation | Interactive Guide Q-function in BER vs. SNR Calculation In digital communications and signal processing, the Q-function plays a significant role in predicting system reliability. It allows engineers to quantify the probability that Gaussian noise will exceed a specific threshold, causing a bit error. What is the Q-function? The Q-function is a mathematical function representing the tail probability of the standard normal (Gaussian) distribution. It is the complementary cumulative distribution function (CCDF) of a standard Gaussian distribution. Q(x) = (1 / √(2Ï€)) ∫â‚“∞ e^(-t² / 2) dt Q-Function Interactive Simulator Move the slider to see how the "Tail Probability" (the area in red) changes. This area represents the Probability of Error (BER) . Threshold Distance ( x ) — (Simulates Increasing SNR) ...

UGC NET Electronic Science December 2024 Question Paper with Answer Key & Detailed Solutions

Home / UGC NET PYQ / June 2025 Solved UGC NET Electronic Science December 2024 Question Paper with Answer Key and Full Explanations 📥 Download Question Paper (PDF) 2025 2024 2023 2022 2021 2020 Q.1 Answer: Option (3) Q.2 Answer: Option (3) Solution 1. JMP SHORT LABEL Intrasegment (within the same code segment). Direct jump. ❌ Not intersegment indirect. 2. JMP 5000H:2000H Intersegment (far jump because both CS and IP are specified). Direct jump (address is explicitly given). ❌ Not indirect. 3. JMP [2000H] The destination address is taken from memory location 2000H. This is indirect. In 8086, a far indirect jump can use a memory operand containing both IP and CS (depending on operand size), making it an intersegment indirect jump. ✅ Correct answer. 4. JMP [BX] Indirect jump through memory addressed by BX. Usually intrasegment (near indirect jump). ❌ Not in...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. 📘 Overview 🧮 Simulator ⚖️ Theory Q-function 📚 Resources 📂 Other Topics: M-ary PSK & QAM Diagrams ▼ 🧮 Simulator for M-ary PSK Constellation 🧮 Simulator for M-ary QAM Constellation BASK (Binary ASK) Modulation Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1. BFSK (Binary FSK) Modulation Transmits one of two signals: +√Eb​ (On the y-axis, the phas...

Online Simulator for ASK, FSK, and PSK

Interactive Digital Signal Processing (DSP) Tutorial and Simulator for ASK, FSK, and BPSK modulation techniques. Try our new Digital Signal Processing Simulator!   •   Interactive ASK, FSK, and BPSK tools updated for 2025. Start Now Digital Modulation Visualizer: ASK, FSK, & BPSK Simulator Learn and visualize binary modulation techniques (ASK, FSK, BPSK) in real-time with adjustable carrier and sampling parameters. Perfect for DSP students and engineers. 📡 ASK Simulator 📶 FSK Simulator 🎚️ BPSK Simulator 📚 More Topics ASK Modulator FSK Modulator BPSK Modulator More Topics 1. ASK (Amplitude Shift Keying) Simulat...

Shannon Limit Explained: Negative SNR, Eb/No and Channel Capacity

Understanding Negative SNR and the Shannon Limit An explanation of Signal-to-Noise Ratio (SNR), its behavior in decibels, and how Shannon's theorem defines the ultimate communication limit. Signal-to-Noise Ratio in Shannon’s Equation In Shannon's equation, the Signal-to-Noise Ratio (SNR) is defined as the signal power divided by the noise power: SNR = S / N Since both signal power and noise power are physical quantities, neither can be negative. Therefore, the SNR itself is always a positive number. However, engineers often express SNR in decibels: SNR(dB) When SNR = 1, the logarithmic value becomes: SNR(dB) = 0 When the noise power exceeds the signal power (SNR < 1), the decibel representation becomes negative. Behavior of Shannon's Capacity Equation Shannon’s channel capacity formula is: C = B log₂(1 + SNR) For SNR = 0: log₂(1 + SNR) = 0 When SNR becomes smaller (including negative values in dB), the expression approache...