I built a support chatbot over the last few weeks. The first couple iterations could answer questions but not reliably, not consistently, and not in a way I'd trust in front of actual users.

But I eventually found an unlock that made all the difference.

Let's step through it.

Getting Started

The traditional approach is to use Retrieval Augmented Generation (RAG) and follow the traditional playbook:

  • scrape documents

  • chunk them

  • embed & store

  • retrieve (from store) and generate answer

  • wrap it in a UI

This is still a valid take. Although there are a few tricks you'll want to employ to take you all the way (more on that later).

Tooling Checklist

You'll need the following to get the initial version stood up:

  1. A document fetcher (e.g., a web scraper - assuming that's where your target documents are)

  2. A chunking strategy

  3. A vector embedding model

  4. A vector store

  5. A large language model

  6. A UI front-end

You can stand most of this up quickly with a coding agent.

I used:

  • Playwright to scrape docs

  • FAISS for the vector store

  • OpenAI for embeddings + generation

  • Streamlit for a quick UI

The specific tools don’t matter much. You’ll have a working chatbot either way.

The first place things start to break down is chunking.

A Quick Detour Into Chunking Strategies

This is where most RAG implementations quietly fail and why "it works" rarely means "it works well."

Not all documents are created equally, so you can't simply employ a one-size-fits-all strategy for chunking your documents (like RecursiveCharacterTextSplitter) and call it a day. You end up splitting up your documents mid-sentence when chunking and lose out on the semantic meaning of the content. This directly impacts the quality of your search results.

A better place to start from is to split documents by heading types and chunk that way. For more bespoke documents, you need to take care and adapt the chunking strategy to the specifics of each type. With a coding agent it's easy to work through it, but it will take a lot of iterations before you get it right.

And That Was Just The Table Stakes

At this point, the chatbot will work.

You'll be able to ask it some basic questions. Sometimes it might even be right. But it will be fragile.

To get to the next level, you'll need to add two things:

  1. Query rewrites

  2. A dialed in system prompt

Query rewrites helped more than I expected. Instead of relying on a single phrasing, I’d generate multiple variations of the question, run each through retrieval, and rank the results.

You can let the model do this but I had better results with a deterministic approach tailored to the structure of my docs.

The system prompt was about constraint. I got very explicit about:

  • What types of documents existed

  • How they differed

  • What the model should and shouldn’t do

I also seeded it with examples of the kinds of questions users would ask, giving it a more defined lane to operate in. I was getting better answers but I only marginally changed how the system worked.

That got me to something... usable. But still not something I’d trust.

The Real Unlock

The missing ingredient? In a word: agents.

By treating the chatbot as an agent, and leveraging tool calling, everything became easier - and quality of the search results jumped considerably.

I left the RAG plumbing in place and introduced an alternative path for tool calling, and delegated to the model to determine which to use.

For the tool calling I started adding a set of deterministic functions that would handle lookups that RAG couldn't handle. Instead of relying purely on retrieval the model could now decide when to call a deterministic path vs. when to fall back to RAG.

Here are some of the actual functions I added as tool calls:

  - _lookup_step_tool() — fuzzy match against indexed deprecated step chunks
  - _get_release_notes_tool() — filter chunks by version + doc_type
  - _get_latest_release_notes_tool() — find highest version in the index
  - _get_release_notes_for_range_tool() — loop through versions, return all notes
  - _list_blog_posts_tool() — filter by category metadata
  - _search_blog_by_author_tool() — fuzzy match author name, extract title from byline
  - _get_deprecated_steps_docs_tool() — find deprecated steps doc URLs

This opened up the chatbot to a broader set of questions that it could support.

For example, before this I couldn’t ask questions like:

  • "How many blog posts did [author] write?"

  • "I upgraded from version X to Y and now I’m seeing an error—what might explain that?"

Before wiring this up to a frontier model, I was running everything locally using Ollama and a fairly small model (e.g., a 7 billion parameter model). I knew I was onto something when I was starting to get consistently reliable results with this setup.

That's when I knew it was ready to roll out to my team to start dog-fooding.

Outro

Is the chatbot perfect? No. It’s only as good as the content it’s working against.

But once I shifted to treating it as an agent with tool calling, it became more effective, more reliable, and able to handle questions that would have broken earlier versions. Instead of relying purely on retrieval, the model could decide when to call into something deterministic and when to fall back to RAG.

RAG gets you a chatbot that can answer questions. A tailored agent gets you a system that can actually help. That difference is where most of the work (and all of the value) lives.

Keep Reading