Lease agreement search

780d30f554e4090ec650 · July 19, 2024, 1:36pm

I have a requirement to build solution to search among lease agreements, the challenge is there could be one or more lease agreement amendments to the master lease agreement, when the question is asked, customer expect the latest data in amendments, plus, they want to even ask what are the amendments in history for a particular section (A.3), also in the lease agreement, there is a rental amount, also a term how the rental amount will be calculated, such as for the first 3 years, it will be 3000 per month, after that, the annual increase will be based on the prior years CPI or 2.5% whichever is less, the customer expect to ask question like what the rent should be for year of 2024, is this doable with AIP? and how, can someone help me understand the process?

otacruta · July 19, 2024, 2:31pm

Hey,

This is a pretty typical LLM type problem, it includes elements of entity classification, extraction, and resolution.

The steps would go something like this:

resolving the amendments to the master agreement, as well as perhaps extracting dates to identify chronological ordering
classifying sections to identify the correct section
extracting relevant values from the sections
resolving values between the master and amendments

This codestrap demo has a pretty good video on setting up a pipeline to do some of this stuff:

https://www.youtube.com/watch?v=lEN326TOcTo

780d30f554e4090ec650 · July 24, 2024, 3:54pm

so it is quite different than using Microsoft AI search and Open AI, which usuasally feed the pdfs to the AI search to index, when user ask questions, the use the intent to search for all pdfs get top n high ranking documents and then feed those top n to open ai to make sense of it - just a natural language interface to search documents to be honest, it could not easily get answer for most recent amendments.

using AIP we kinda need to parse the lease agreement into a “structured” fashion? but each lease has very different format and terms … how do we handle that?

helenq · July 24, 2024, 8:08pm

Have you tried out the PDF text extraction board in Pipeline Builder?

https://www.palantir.com/docs/foundry/pb-functions-expression/pdfOcrV1/

780d30f554e4090ec650 · July 25, 2024, 8:12am

does this mean that for each lease agreement and its amendments I have to develop a pipeline to process it into a single dataset and add to ontology and then we can create an app to chat with those dataset?

what the dataset would looks like?

master
lease-Id, sign-date, landlord, tenant, guarantor, section1, section2, …

amendments
lease-Id, amendment-date, section1-amend, new-section

we will join them and create a single lease object and add it to ontology?

if we need to create a pipeline for each lease agreement then it would be very time consuming and when a new amendment added we need to modify the pipeline, when new lease is added new need to create a pipeline …

the lease agreement is in PDF and the format can be very different. does ontology can only handle structured dataset? there is no way to leverage AI to make sense of those documents?

ivy · July 25, 2024, 5:16pm

Build with AIP has some great examples showing how to transform sets of PDFs using LLMs.

Open up the Build with AIP app on your stack and search for the following examples:

Parse PDFs with LLMs in Pipeline Builder
Use LLMs for entity extraction from PDFs in Pipeline Builder

Both of these use media sets to transform a set of multiple PDFs (all in one pipeline) into a structured dataset. So no, you won’t need a pipeline for each lease agreement. Prompt engineering your entity extraction block in Pipeline Builder is how you can most effectively capture the variety found in lease agreements into a structured dataset.