Error running Pyspark in Jupyter Workspace

I am trying to read a dataset using Pyspark in Jupyter Workspace but getting Py4Java error.






I have installed openjdk(conda), pyspark(py-pi) and java-jdk library still the issue persists.

Hi pratyush335,

Wanted to confirm first, have you installed openjdk and pyspark from conda and pypi, respectively? Per this documentation: https://www.palantir.com/docs/foundry/code-workspaces/code-workspaces-faq#can-i-use-pyspark-in-code-workspaces

Best,

calebh

So, I installed openjdk from conda and had installed pyspark also from conda. Have uninstalled and reinstalled it from py-pi. Tried re-running the code but the issue persists.

image
image

Solved:

The solution lies in installing the correct version of the libraries:
If you follow the documentation;

  1. Install pyspark from PyPi, Install openjdk from conda:
    Results in:
    image
    I guess the reason is version mismatch. As you by default install the latest version
    image

However, if you don’t install openjdk and install java-jdk instead;

  1. Install pyspark from PyPi, Install java-jdk from conda:
    Results in:
    image
    AND 8.0.112 version of openjdk
    image
    image
    Some warnings but NO ERRORS!

Peace.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.