An AI chatbot that answers questions about Anvil

yahiakalabs · August 27, 2023, 1:02am

I finally got around to making an example chatbot I can share with y’all!

How it works

You ask a question about Anvil and the bot answers with reference articles!
A server call launches a background task that streams the answer back to the client, along with reference articles
The background task uses langchain to pull in data from a vector store/index (hosted on Pinecone) and uses the ConversationalRetrievalChain to summarize chat history, pose a question, get a response as well as references.
The ‘chain’ does many steps under the hood: embeds the user’s question (converts it to a 1536 dimension vector), does a similarity search on the index for the most similar chunks of text (semantic search), takes the top 4 chunks and throws all that into a prompt for ChatGPT to have the relevant info to answer the original question.
The vector store/index is where all the “anvil knowledge” is. I populated this by scraping and indexing the Anvil site (with permission).

Components

Basically a sticky header, a sticky footer, and a repeating panel that has a somewhat elaborate template.
The repeating panel template displays ‘reference chips’ if certain data is contained in the self.item. Lots of logic written in the template definition. The reference chips are Links to reference articles relevant to the bot’s answer to your question.

Embed Code (chatbot window):

This code can be inserted into a web page to put the app into an iframe that appears/disappears with the click of a button.

  <script src="https://chatbeaverbottest.anvil.app/_/theme/chat_embed.js"></script>
  <script>
      createAnvilChat("https://chatbeaverbottest.anvil.app#?bot_id=61403aae-d77c-46ad-b8d4-f5675c1a0df7");
  </script>

Indexing the Anvil Site (code not shared)

I crawled the sitemaps for Anvil and the Anvil Forum to get thousands of URLs
I used langchain’s UnstructuredURLLoader to scrape, parse, chunk, and embed the content from the web pages. No special configuration, but this step is the most critical to getting a good index, which is critical to getting a useful bot

Inspiration:

Clone Link

Requires that you set up your own OpenAI and Pinecone API keys, as well as your own Pinecone index. Alternatively, you could use a FAISS index (langchain FAISS object pickled and stored as a BlobMedia object).
The index should have metadata keys ‘source’ and ‘title’

hugetim · August 27, 2023, 2:31am

This is incredible, both as a demo of what one can build with Anvil and a useful tool for building with Anvil.

Here is the result for my literal first query:

To understand what a huge step up this is for getting Anvil help, compare that with what I get if I type the same thing into Discourse search in the Forum:

…or what I get when typing that into Docs search:

I feel like I want to give you some kind of award. For now:

yahiakalabs · August 27, 2023, 2:07pm

Thank you! This bot is useful precisely because our forums are so active, which I love.

The retrieval augmented generation (RAG) method is all the rage these days and one of the few that is low cost - high value. It only relies on ChatGPT’s reasoning capabilities and not on its “knowledge”.

I love this documentation use case because as you said, the semantic search (powered by OpenAI’s embeddings model) gives much more relevant results than keyword based search.

There are a bunch of startups productizing this use case as well as big companies like Intercom. Nice to see we can compete using Anvil!

divyeshlakhotia · August 27, 2023, 4:19pm

This is such a great project!

nerd01 · August 27, 2023, 11:13pm

Thanks for this amazing result. Can be applied to a lot of other fields as well.

joinlook · August 30, 2023, 3:03pm

Amazing. I can’t wait for the AI anvil copilot plug-in