Has anyone setup dataset projections on their audit logs to improve performance? We attempted to setup a dataset projection on our 2 year audit log and it failed during the scheduled build with the following explanation. Our audit log size is about 188G. Looking for suggestions on applying Spark profiles for a very large dataset.
Summary of Failure:
The Foundry build failed due to a Spark-related issue: Missing an output location for shuffle 1 partition 0. This error typically occurs when Spark cannot find or retrieve shuffle data required for the next stage. The root cause could be related to a configuration error, resource issue, or corruption in intermediate shuffle data.
The ExecutorUnreachable exception suggests that a Spark executor might have become unavailable, likely due to memory limits, network issues, or resource contention within the job.
Suggested Fix:
- Check Resource Allocation: The job might require more CPU, memory, or disk space. Ensure that sufficient resources are allocated for the Spark executors to handle the shuffle tasks.
- Retry Job: Sometimes this issue occurs transiently. Retry the job to confirm whether the error persists.
- Optimize Shuffle Operations:
- Consider reducing the amount of shuffle data by filtering or aggregating earlier in your pipeline.
- Adjust Spark configurations, such as increasing
spark.executor.memoryor modifying the number of shuffle partitions (spark.sql.shuffle.partitions).