py4j.protocol.Py4JJavaError: An error occurred while calling o301.writeDataset.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 57.0 failed 4 times, most recent failure: Lost task 0.3 in stage 57.0 (TID 410) (10.0.77.123 executor 16): org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (0 vs. 0) Number of sorting keys must be greater than zero
Retriable: False
Expression: sortingKeys_.size() > 0
Function: TopNRowNumberNode
File: /home/circleci/project/ep/build-velox/build/velox_ep/velox/core/PlanNode.cpp
This is the error line of code I found, while debugging. Any suggestions?
df = (df
.join(claims_object_cmr_tier_distinct,
[“client_id”, “claim_create_year”, “profit_center_4”], “left”)
)
py4j.protocol.Py4JJavaError: An error occurred while calling o301.writeDataset.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 57.0 failed 4 times, most recent failure: Lost task 0.3 in stage 57.0 (TID 410) (10.0.77.123 executor 16): org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (0 vs. 0) Number of sorting keys must be greater than zero
Retriable: False
Expression: sortingKeys_.size() > 0
Function: TopNRowNumberNode
File: /home/circleci/project/ep/build-velox/build/velox_ep/velox/core/PlanNode.cpp
Line: 1688
Stack trace:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
# 10
# 11
# 12
# 13
at org.apache.gluten.vectorized.PlanEvaluatorJniWrapper.nativeCreateKernelWithIterator(Native Method)
at org.apache.gluten.vectorized.NativePlanEvaluator.createKernelWithBatchIterator(NativePlanEvaluator.java:70)
at org.apache.gluten.backendsapi.velox.VeloxIteratorApi.genFinalStageIterator(VeloxIteratorApi.scala:242)
at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.$anonfun$compute$1(WholeStageZippedPartitionsRDD.scala:59)
at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
at org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
at org.apache.gluten.execution.WholeStageZippedPartitionsRDD.compute(WholeStageZippedPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$2(Executor.scala:633)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:97)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:636)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2858)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2794)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2793)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2793)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3099)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3033)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3022)
at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:49)