I finally got around to making an example chatbot I can share with y’all!
How it works
- You ask a question about Anvil and the bot answers with reference articles!
- A server call launches a background task that streams the answer back to the client, along with reference articles
- The background task uses langchain to pull in data from a vector store/index (hosted on Pinecone) and uses the ConversationalRetrievalChain to summarize chat history, pose a question, get a response as well as references.
- The ‘chain’ does many steps under the hood: embeds the user’s question (converts it to a 1536 dimension vector), does a similarity search on the index for the most similar chunks of text (semantic search), takes the top 4 chunks and throws all that into a prompt for ChatGPT to have the relevant info to answer the original question.
- The vector store/index is where all the “anvil knowledge” is. I populated this by scraping and indexing the Anvil site (with permission).
- Basically a sticky header, a sticky footer, and a repeating panel that has a somewhat elaborate template.
- The repeating panel template displays ‘reference chips’ if certain data is contained in the self.item. Lots of logic written in the template definition. The reference chips are Links to reference articles relevant to the bot’s answer to your question.
Embed Code (chatbot window):
- This code can be inserted into a web page to put the app into an iframe that appears/disappears with the click of a button.
<script src="https://chatbeaverbottest.anvil.app/_/theme/chat_embed.js"></script> <script> createAnvilChat("https://chatbeaverbottest.anvil.app#?bot_id=61403aae-d77c-46ad-b8d4-f5675c1a0df7"); </script>
Indexing the Anvil Site (code not shared)
- I crawled the sitemaps for Anvil and the Anvil Forum to get thousands of URLs
- I used langchain’s UnstructuredURLLoader to scrape, parse, chunk, and embed the content from the web pages. No special configuration, but this step is the most critical to getting a good index, which is critical to getting a useful bot
- Building a Chatbot
- How to stream data from server to client?
- Pinning a panel so that it does NOT scroll out of sight - #7 by stucork
- Requires that you set up your own OpenAI and Pinecone API keys, as well as your own Pinecone index. Alternatively, you could use a FAISS index (langchain FAISS object pickled and stored as a BlobMedia object).
- The index should have metadata keys ‘source’ and ‘title’