Convert into raw text
Break the text into small sections based on the semantics and headers of the document
For each section, we generate a list of questions that the section of text answers. E.g. almost like Jeopardy, we are going backwards from the answer (contained in the document) to questions
We fetch the embedding vector for each of the questions
Create a summary of the entire document, referred to as the “Qualifying Text” internally because it helps qualify whether a particular knowledge chunk is relevant
Write the knowledge chunks out to the database

Querying

...

When you query the knowledge base, the system goes through the following steps:

The original raw query is transformed into a clean query. For the default knowledge base configuration, this means taking an ambiguous sentence, like location of food and turning it into the format of a question, such as Where is the food located? so that the text matches the format of the questions that were generated by the ingestion engine
We lookup the embedding vector for the transformed query
We use the embedding vector to perform a query on the knowledge base and find the top K knowledge chunks whose matching text (a.k.a the generated question) is the closest to the query text
We use a reranking algorithm to rerank the matched knowledge chunks against the query
We discard all except the top 5 matching knowledge chunks, as determined by the reranker

Loading In Knowledge

There are many different ways to load in knowledge to your agent, which will depend a lot on the different use-cases for your agent.

...