I want to use the pyspark.testing.assertDataFrameEqual function in my tests, but I’m not able to import it.
from pyspark.testing import assertDataFrameEqual
Results in an error from the linter saying Internal Error: ModuleNotFoundError: No module named 'pyspark.testing'
.
But the function should be available from PySpark version 3.5.0 onwards, and everywhere I look I see the version 3.5.1 mentioned, e.g. in the conda.lock files (test and run) I see:
pyspark-src=3.5.1.37=py_0 @ noarch
pyspark=3.5.1.37=py_0 @ noarch
pytest-forked=1.6.0=pyhd8ed1ab_1 @ noarch
pytest-html=1.22.0=py_0 @ noarch
pytest-metadata=2.0.4=pyhd8ed1ab_0 @ noarch
pytest-transforms=2.239.1=py_0 @ noarch
pytest-xdist=1.34.0=pyh9f0ad1d_0 @ noarch
pytest=7.2.1=pyhd8ed1ab_0 @ noarch
pytest=8.3.4=pyhd8ed1ab_1 @ noarch
python-dateutil=2.9.0.post0=pyhff2d567_1 @ noarch
python=3.11.11=h9e4cc4f_1_cpython @ linux-64
When committing and running checks it claims:
pyspark-src=3.5.1.37=py_0@tar.bz2
pyspark=3.5.1.37=py_0@tar.bz2
pytest=8.3.4=pyhd8ed1ab_1@conda
python-dateutil=2.9.0.post0=pyhff2d567_1@conda
python=3.11.11=h9e4cc4f_1_cpython@conda
Using the debugger and console I also get 3.5.1:
> import pyspark
> print(pyspark.__version__)
3.5.1
> from pyspark.testing import assertDataFrameEqual
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pyspark.testing'
A recent build report of a dataset also states that in the environment it ran with 3.5.1.38
.
Does anyone know how to import it please, or why it’s perhaps missing from PySpark 3.5.1. in Foundry? Is it perhaps packaged in a separate Conda library that needs to be imported, and if so which one please?