Tutorial: Using spark-fits with spark-shell, pyspark or jupyter notebook
Using with spark-shell/pyspark
This package can be added to Spark using the --packages
command line
option. For example, to include it when starting the spark shell:
# Scala 2.11
$SPARK_HOME/bin/spark-shell --packages com.github.astrolabsoftware:spark-fits_2.11:1.0.0
# Scala 2.12
$SPARK_HOME/bin/spark-shell --packages com.github.astrolabsoftware:spark-fits_2.12:1.0.0
Using --packages
ensures that this library and its dependencies will
be added to the classpath (make sure you use the latest version). In Python, you would do the same
# Scala 2.11
$SPARK_HOME/bin/pyspark --packages com.github.astrolabsoftware:spark-fits_2.11:1.0.0
# Scala 2.12
$SPARK_HOME/bin/pyspark --packages com.github.astrolabsoftware:spark-fits_2.12:1.0.0
Alternatively to have the latest development you can download this repo and build the jar, and add it when launching the spark shell (but won’t be added in the classpath)
$SPARK_HOME/bin/spark-shell --jars /path/to/jar/<spark-fits.jar>
or with pyspark
$SPARK_HOME/bin/pyspark --jars /path/to/jar/<spark-fits.jar>
By default, pyspark uses a simple python shell. It is also possible to launch PySpark in IPython, by specifying:
export PYSPARK_DRIVER_PYTHON_OPTS="path/to/ipython"
$SPARK_HOME/bin/pyspark --jars /path/to/jar/<spark-fits.jar>
Same with Jupyter notebook:
cd /path/to/notebooks
export PYSPARK_DRIVER_PYTHON_OPTS="path/to/jupyter-notebook"
$SPARK_HOME/bin/pyspark --jars /path/to/jar/<spark-fits.jar>
See
here
for more options for pyspark. To build the JAR, just run
sbt ++{SBT_VERSION} package
from the root of the package (see
run_*.sh
scripts). Here is an example in the spark-shell:
Using with Jupyter Notebook
We provide notebooks (pyspark) in the section example. For notebook in Scala/Spark (using the Toree kernel), see the spark3d examples.
Using pyspark + notebook on a cluster
If you want to launch a pyspark from a cluster directly, you would use:
PYSPARK_DRIVER_PYTHON_OPTS="/path/to/jupyter-notebook --no-browser --port=7777" pyspark <...>
# --> Follow the URL, which includes the token