Memory Model for Leightweight transformation

I’m trying to understand the memory model of leightweight transforms.
As the technology I use the Polars transforms. (All lazy)

I did some test with varying sizes of input-data (I use the size of of dataset as indicated in the preview or DataLineage).

To process data of 1GB of input data (Output = 358MB) I need a engine of 32GB
@lightweight(cpu_cores=4, memory_gb=32)
To process 7GB of input (Output = 358MB) I need a engine of 64GB
@lightweight(cpu_cores=4, memory_gb=64)
To process 7GB of input (Output = 3,8GB) I need a engine of 96GB
@lightweight(cpu_cores=8, memory_gb=96)

Having less memory produces an outofmemory error.

How does this work? And how can I estimate whether the I’m at the limit of the requested memory or somewhere in a safe space.

The order of magnitude of Input-size to needed size is so strange that this introduces a sort of uncertainty…

2 Likes