error in importing library pyspark.pandas

After importing the following PySpark library, I am encountering the error:

import pyspark.pandas as ps

Error: Internal Error: AttributeError: np.NaN was removed in the NumPy 2.0 release. Use np.nan instead.

I am already using np.nan in my code and have also tried downgrading NumPy to version 1.26.4, but the issue persists.

Has anyone encountered a similar problem or found a solution?

I think this will be hard to debug there, without more information.

Do you have the location of where this is thrown ? The stacktrace ? etc.
By figuring out where this is happening, you might understand what generates it even if explicitly in your code you do not use NaN.

1 Like