...
Convert into raw text
Break the text into small sections based on the semantics and headers of the document
For each section, we generate a list of questions that the section of text answers. E.g. almost like Jeopardy, we are going backwards from the answer (contained in the document) to questions
We fetch the embedding vector for each of the questions
Create a summary of the entire document, referred to as the “Qualifying Text” internally because it helps qualify whether a particular knowledge chunk is relevant
Write the knowledge chunks out to the database
Querying
...
When you query the knowledge base, the system goes through the following steps:
The original raw query is transformed into a clean query. For the default knowledge base configuration, this means taking an ambiguous sentence, like
location of food
and turning it into the format of a question, such asWhere is the food located?
so that the text matches the format of the questions that were generated by the ingestion engineWe lookup the embedding vector for the transformed query
We use the embedding vector to perform a query on the knowledge base and find the top K knowledge chunks whose
matching text
(a.k.a the generated question) is the closest to the query textWe use a reranking algorithm to rerank the matched knowledge chunks against the query
We discard all except the top 5 matching knowledge chunks, as determined by the reranker
Loading In Knowledge
There are many different ways to load in knowledge to your agent, which will depend a lot on the different use-cases for your agent.
...