MultiGPU training in Code Repo & Jupyter Workspace

Hello,

I am trying to run Multi-GPU training in Foundry but I would like some expertise from the community:

  • Spark with profile=[“DYNAMIC_ALLOCATION_ENABLED”,“DRIVER_GPU_ENABLED”] → One GPU get picked up from RQ and is “activated” during build but from what I understand the Driver shouldn’t be the one used for this task in a Spark infra.

- Spark with profile=[@configure(profile=[“DYNAMIC_ALLOCATION_ENABLED”, “EXECUTOR_GPU_ENABLED”])] → I believe that’s working as expected.

I understand that Palantir recommends the usage of Lightweight for this usage however no arguments available - like gpu_count - to set GPU to 2 ?

With Lightweight: @lightweight(gpu_type=“NVIDIA_A10G”), only One GPU get picked up from RQ.

As well, I am not sure how to enable such capacity on Code Workspace / Jupyter Notebook ?

Best regards,