Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The knowledge base can be connected to many different sources of data, combining them together into a homogenous knowledge system. Sources of data include documents and web-pages that you import through the user interface. But may also include custom data schemas that you have created yourself, containing information that the bot has extracted from conversations (this allows the agent to have a ‘memory’ in-between multiple sessions). Eventually we may even allow third-party API services to become sources of data for the knowledge base.

Types of smart chains for knowledge base customization

Each source of data can be customized independently. There are six different smart-chains that you can modify to change the behavior of the knowledge base. Those are:

  1. Chunker chain - The chunker chain is responsible for taking a large, long document and breaking it up into individual sections of content. The default version just breaks a document into sections based on format meaning, e.g. using the existing headers and paragraphs as section breaks

  2. Matching Text Chain - The matching text chain is responsible for taking a chunk of content, and transforming them into bits of text that are embedded and matched against when querying. The default form that matching texts take are questions, e.g. “What kind of food is available in Alaska?”. However, its not necessary and some use-cases of the knowledge base require querying based on other formats of text.

  3. Qualifying Text Chain - The qualifying text chain is responsible for taking the chunk of content, along with the document as a whole, and creating a summary of the document as a whole along with how the text of the specific knowledge chunk fits into that larger document. E.g. if the whole document is on the subject of “Arctic Cuisine”, then the qualifying text might be “The section describes Alaskan cuisine within the context of a larger document describing various arctic cuisines”. The qualifying text is used to provide contextual information so that the LLM can correctly interpret the content of the knowledge chunk, which loses some meaning when separated from the larger document

  4. Query Transformation Chain - The query transformation chain is responsible for taking the query provided by the user, which might take a variety of different forms, and transform it into the same format that is produced by the Matching Text Chain. By default the matching texts take the format of a question, so the default query transformation chain is designed to take whatever you type in and transform it also into a question. E.g. it might take an abstract bit of text like “location of food” and turn it into a proper question format of “Where can the food be found?”

  5. Reranking Chain - The reranking chain is responsible for taking knowledge chunks that have been returned by the core Knowledge Base, and then using algorithms to assign a score between 0 and 1 as to how well the given knowledge chunk matches the query. By default, that means the reranking system is effectively responsible for determining if a given knowledge chunk actually answers the question provided by the user.

  6. Filtering Chain - The final filtering chain is responsible for removing any knowledge chunks that should not be returned. The filtering chain sets a cut-off for how good the match and rerank scores need to be in order for the knowledge chunks to be returned at all. In many situations, it’s better for the database not to return anything if the knowledge chunks it found do not match closely enough the query provided by the user. So the filtering chain is where you can customize the business logic. The default filtering chain sets a cutoff for both the match score and the rerank score at 0.5

Pre-made Smart Chains for Knowledge Base

Identity Chains

For each of these smart chains, you can find an “identity” version of the chain that does nothing. Those can be observed if you search for the word “identity” in the smart chains.

...

These identity chains just pass their input data right on through to their output data.

  • Identity Chunker Chain - Keeps the entire content text as a single large block of text without breaking it apart

  • Identity Matching Text - Uses the entire content text as the matching text without transforming it

  • Identity Qualifying Text - Uses the entire content text as the qualifying text without transforming it

  • Identity Query Transformer - Passes through the users query verbatim without transforming it

  • Identity Reranker - Assigns the highest rerank score of 1.0 to every knowledge chunk regardless of its contents

  • Identity Filterer - Does not actually filter anything and just passes through all of the knowledge chunks through to its output

Premade Document Processing Chains

These are pre-made smart chains designed for processing long-form documents in a default Q&A style database. They can be found by searching for smart-chains with the prefix knowledge_base_document_ in the table view.

...

  • Document Chunker - the default document chunker will take the input text and break it apart into sections based on lines and semantic and format information. The goal of the default chunker is to try to break apart the document into small sections that are roughly analogous to what the original author might have viewed the sections of the document to be. E.g. wherever the original author put headers and sections

Note

IMPORTANT! The input document must contain newline characters. TODO: Fix this by automatically wrapping input text in the default chunker.

  • Document Qualifying Text - the default document qualifying text smart chain produces a summary of the entire document to use as the qualifying text for all the separate knowledge chunks which resulted from the document.

  • Document Matching Text - the default matching text smart chain is designed to take a chunk of content and try to come up with the questions that said content is answering. E.g. it works almost like jeopardy, trying to take an answer and go backwards to the questions. It is designed to generate as many questions as possible for the given knowledge chunk as it can. Additionally, it is supposed to take into account the contextual information provided by the qualifying text.