I was reading the documentation of linter, and i saw that they propose to use lightweight for the small transformations so that they can run in polars for example, and also in linter i saw that for some cases for small datasets it proposes to use the profile KUBERNETES_NO_EXECUTORS, i don’t see clearly the difference between the two and when should we use each one of them ?
Your help in solving this issue or giving us some other ideas to achieve this use case will be greatly appreciate.
The previous way of utilising lower memory was through using the KUBERNETES_NO_EXECUTORS profile. Since that linter rule was created, the lightweight transforms was released as new feature. There is a linter rule that also suggests that you used the lightweight transform.
As mentioned in the documentation: As individual computers become more powerful, an increasing number of data transformations can be run on a single node. This means that, in the case of small-to-medium sized datasets, transformations can be executed without relying on distributed parallelism.
and here: Lightweight transforms do not support executing PySpark queries. Instead, queries must be written using alternative APIs.
You may want to use KUBERNETES_NO_EXECUTORS if you know that the datasets involved will eventually grow, and you will need to rely on distributed parallelism in the future. You can write a pyspark query, and know that the logic is fixed for any future that requires you to scale the transform. If using lightweight, you cannot use pyspark.