I have a requirement to build solution to search among lease agreements, the challenge is there could be one or more lease agreement amendments to the master lease agreement, when the question is asked, customer expect the latest data in amendments, plus, they want to even ask what are the amendments in history for a particular section (A.3), also in the lease agreement, there is a rental amount, also a term how the rental amount will be calculated, such as for the first 3 years, it will be 3000 per month, after that, the annual increase will be based on the prior years CPI or 2.5% whichever is less, the customer expect to ask question like what the rent should be for year of 2024, is this doable with AIP? and how, can someone help me understand the process?
Hey,
This is a pretty typical LLM type problem, it includes elements of entity classification, extraction, and resolution.
The steps would go something like this:
- resolving the amendments to the master agreement, as well as perhaps extracting dates to identify chronological ordering
- classifying sections to identify the correct section
- extracting relevant values from the sections
- resolving values between the master and amendments
This codestrap demo has a pretty good video on setting up a pipeline to do some of this stuff:
so it is quite different than using Microsoft AI search and Open AI, which usuasally feed the pdfs to the AI search to index, when user ask questions, the use the intent to search for all pdfs get top n high ranking documents and then feed those top n to open ai to make sense of it - just a natural language interface to search documents to be honest, it could not easily get answer for most recent amendments.
using AIP we kinda need to parse the lease agreement into a “structured” fashion? but each lease has very different format and terms … how do we handle that?
Have you tried out the PDF text extraction board in Pipeline Builder?
https://www.palantir.com/docs/foundry/pb-functions-expression/pdfOcrV1/
does this mean that for each lease agreement and its amendments I have to develop a pipeline to process it into a single dataset and add to ontology and then we can create an app to chat with those dataset?
what the dataset would looks like?
master
lease-Id, sign-date, landlord, tenant, guarantor, section1, section2, …
amendments
lease-Id, amendment-date, section1-amend, new-section
we will join them and create a single lease object and add it to ontology?
if we need to create a pipeline for each lease agreement then it would be very time consuming and when a new amendment added we need to modify the pipeline, when new lease is added new need to create a pipeline …
the lease agreement is in PDF and the format can be very different. does ontology can only handle structured dataset? there is no way to leverage AI to make sense of those documents?
Build with AIP has some great examples showing how to transform sets of PDFs using LLMs.
Open up the Build with AIP app on your stack and search for the following examples:
Parse PDFs with LLMs in Pipeline Builder
Use LLMs for entity extraction from PDFs in Pipeline Builder
Both of these use media sets to transform a set of multiple PDFs (all in one pipeline) into a structured dataset. So no, you won’t need a pipeline for each lease agreement. Prompt engineering your entity extraction block in Pipeline Builder is how you can most effectively capture the variety found in lease agreements into a structured dataset.