Excel Parsing with Typescript

Hi,

In the code repository, there is a TypeScript code for uploading Excel files. However, an error occurs if the file size exceeds 1 MB. When I upload a 600 KB file, the data is extracted within a few seconds. But if I upload a 1.2 MB file, a runtime error occurs after 60 seconds.

FYI, I’ve tried streaming and chunking methods to solve the issue, but the same problem persists.

import { OntologyEditFunction, Attachment } from "@foundry/functions-api";
import { Workbook, Row as ExcelRow } from 'exceljs';
import { Readable } from 'stream';

type Row = string[];

export class ExcelUpload {
    @OntologyEditFunction()
    public async packingListParsing(
        excelFile: Attachment,
        uploaderMultipass: string,
        sheetName: string
    ): Promise<void> {
        if (!excelFile) {
            console.error("Excel 파일을 찾을 수 없습니다.");
            return;
        }

        try {
            await this.streamParseExcelFile(excelFile, sheetName);
        } catch (error) {
            console.error("Error processing excel file:", error);
            throw error; // Re-throw the error for upper-level handling
        }
    }

    private async streamParseExcelFile(excelFile: Attachment, sheetName: string): Promise<void> {
        const blob = await excelFile.readAsync();
        const arrayBuffer = await blob.arrayBuffer();
        const buffer = Buffer.from(arrayBuffer);

        const stream = new Readable();
        stream.push(buffer);
        stream.push(null);

        const workbook = new Workbook();
        await workbook.xlsx.read(stream);

        const worksheet = workbook.getWorksheet(sheetName);
        if (!worksheet) {
            console.error(`Sheet "${sheetName}" not found`);
            return;
        }

        const chunkSize = 4000; // 청크 크기 설정
        let rows: Row[] = [];
        const promises: Promise<void>[] = [];

        const processChunk = async (chunkRows: Row[]): Promise<void> => {
            const chunkPromises = chunkRows.map(async (row) => {
                try {
                    await this.processRow(row);
                } catch (processError) {
                    console.error("Error processing row:", processError);
                    throw processError;
                }
            });
            await Promise.all(chunkPromises);
        };

        worksheet.eachRow({ includeEmpty: true }, (row) => {
            const rowValues = row.values as Row;
            rowValues.shift(); // 첫 번째 값은 행 번호이므로 제거
            rows.push(rowValues);

            if (rows.length >= chunkSize) {
                promises.push(processChunk(rows));
                rows = []; // 청크 처리 후 행 목록 초기화
            }
        });

        // 남아있는 행들 처리
        if (rows.length > 0) {
            promises.push(processChunk(rows));
        }

        await Promise.all(promises);

        console.log(`Processed ${worksheet.rowCount} rows in total`);
    }

    private async processRow(row: Row): Promise<void> {
        // Process each row here
        // For example:
        console.log("Processing row:", row);
        // Add your processing logic here
    }
}```

I’m sorry you are having troubles with larger files. What you are seeing happens when you exceed the memory limit, which is 128mb by default. Some places in your code duplicate the memory, which makes it more prone to exceeding the limit.

We are working on two features that would help you here:

  1. Improving errors regarding exceeding memory, so hopefully you will get something clear over a timeout
  2. Giving users a way to increase the memory limit of their function.

In the short term, I’d suggest looking deeper at your code and avoiding keeping stuff in memory (like processing one row at a time rather than the entire file)

Thank you for your response.

We are aware of the 128MB memory limit issue, which is why we have implemented streaming and chunking methods. However, we have noticed that processing a workbook takes a significantly long time after creating buffers, as referenced below.

We are puzzled by the fact that Excel files sized between 600KB and 1MB only take 1-2 seconds to process, whereas a 1.2MB file takes over 60 seconds. For your information, when using the SheetJS library, we can parse files up to 1.6MB, but it fails for files larger than 1.6MB.

const workbook = new Workbook();
await workbook.xlsx.read(stream);