The folowing TypeScript code defines a class SemanticSearch
used as part of a Semantic Search workflow that includes several methods. One of them being for extracting text from a PDF blob and retrieving it’s text
- Imports:
import { Function, Attachments, Attachment, MediaItem } from "@foundry/functions-api";
import * as pdfjsLib from 'pdfjs-dist';
- The code imports necessary modules and types from
@foundry/functions-api
and@foundry/ontology-api
. - It also imports the
pdfjs-dist
library for handling PDF documents.
- Setting the Worker Source for PDF.js:
pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.js`;
- Method:
extractTextFromPDFBlob
:
public async extractTextFromPDFBlob(pdfBlob: Blob): Promise<string> {
// Convert Blob to ArrayBuffer
const arrayBuffer = await pdfBlob.arrayBuffer();
// Load the PDF document
const pdfDoc = await pdfjsLib.getDocument({ data: arrayBuffer }).promise;
let extractedText = '';
// Loop through all pages and extract text
for (let i = 1; i <= pdfDoc.numPages; i++) {
const page = await pdfDoc.getPage(i);
const textContent = await page.getTextContent();
// Join all the text items into a single string
const pageText = textContent.items.map((item: any) => item.str).join(' ');
extractedText += pageText + '\n';
}
return extractedText;
}
- This asynchronous method takes a
Blob
object representing a PDF file and extracts text from it. - It converts the
Blob
to anArrayBuffer
, loads the PDF document, and iterates through all pages to extract text content. - The extracted text from each page is concatenated into a single string and returned.
- Method:
getOrgbusinessDataAttachment
:
The issue I’m getting is the following:
Promise.withResolvers is not a function.
Error Parameters: {}
TypeError: Promise.withResolvers is not a function
at new PDFDocumentLoadingTask (UserCode:28401:32)
at Module.getDocument (UserCode:28214:16)
at SemanticSearch.extractTextFromPDFBlob (UserCode:15587:39)
at async SemanticSearch.getOrgbusinessDataAttachment (UserCode:15603:30)
at async le.executeFunctionInternal (FunctionsIsolateRuntimePackage:2:1008381)
at async Ne (FunctionsIsolateRuntimePackage:2:1007469)
at async le.executeFunction (FunctionsIsolateRuntimePackage:2:1007756)
at async userFunction (FunctionsInitialization:8:43)
It’s suggested that:
The build of PDF.js you are using does not support running in Node.js (i.e. only in the browser). The error comes from Promise.withResolvers
being called, which is not supported by Node.js
https://github.com/mozilla/pdf.js/issues/18006, the recommended way to run it under Node.js is to use the legacy build https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support (using pdfjs-dist/legacy/build/pdf.js
).
Source: https://stackoverflow.com/questions/78415681/pdf-js-pdfjs-dist-promise-withresolvers-is-not-a-function
Is this happening to any of you?