We have a direct connection set up and started to use magritte-paging-inc-param-call
and since then we started getting a driver out of memory error
Module (i.e. driver) ran out of memory Message not helpful? The driver running the job ran out of memory while running your job. Common reasons include -Broadcasting large datasets. If query plan for this job contains broadcast joins, consider removing them from your code (if manually applied) and disabling automatic broadcast join by applying the AUTO_BROADCAST_JOIN_DISABLED profile; or increasing driver memory. -Using .collect() or other Spark actions that retrieve data to the driver. -Doing computations locally on the driver, using for example Pandas -Having a large number of tasks If the problem persists, try increasing the driver memory by modifying your Spark profile.
So I have two questions:
- what does setting the
transactionType
toSnapshot
andincremental
totrue
at the same time do? Does it run in incremental mode or not? from the transaction history, it looks like it still runs in snapshot mode? - what are some ways to fix that memory issue? I am guessing the default profile here is SMALL? depending on the environment, the file size can be large, in the last successful build there were 548 files for 2.6GB but it can be more