...
We are digesting blog articles, presentations, and other documents, and extracting the knowledge out of them as best practices
These best practices are going to be formatted as a single paragraph of text that contains some specific pithy piece of wisdom or advice
The user is going to upload their Pitch Deck in the form of a PDF
We are going to apply run a custom smart chain to break apart that pitch deck, and provide recommendations on a page by page basis that have been derived from the knowledge base.
To accomplish this, we must apply all of the following customization's:
Chunker - Our document chunkerthat:
Breaks apart the PDF into pages
Uploads each page text verbatim into the knowledge base (relying on the knowledge base query transformer to transform it into the appropriate format)
Takes the resulting knowledge chunks, and applies a custom prompt to each one to “contextualize” it with the specific information contained in the deck, producing a reccomendation that can be displayed to the user
To accomplish this, we must apply all of the following customization's:
Chunker - Our document chunker smart-chain will need to take in the original documents, and transform them into small, bite sized chunks of text, each containing a single best practice or idea that was extracted from the document. This isn’t really breaking apart the original document, but rather wholly digesting it and transforming it into something new - one paragraph descriptions of best practices
Matching Text - Our matching text smart chain needs to take each of the best-practice paragraphs generated by the chunker, and transform them into our standard matching text format described above, e.g.
A problem slide for a series b startup in health care.
We may design the prompt to a bunch of potential matching texts for each best-practice paragraph that it processesQualifying Text - We can stick with the default qualifying text, which would just be a summary of the original document that the best-practice was derived from, in case that information is relevant
Query Transformer - Our use scenario for our knowledge base involves uploading the raw text of each presentation slide. We would need to make a custom query transformer which can take that raw-text and convert it into our standard matching format described above, e.g.
A problem slide for a series b startup in health care.
Reranker - Our matching text is relatively simple and close ended, so we may want to disable the reranker by switching to the identity reranker and rely on the matching text alone. This reduces the cost of knowledge base queries and improves the response speed.
Filterer - Given the very closed-ended nature of our matching text, we may find that by default, all the match scores end up always being very high, e.g.
> 0.9
even between matching texts that are supposed to be different and not match. Therefore, we may have to adjust the default filtering smart chain to calibrate it for our specific use case, setting a cutoff that may be as high as0.95
or0.97