Scala installation

Requirements

This library requires Spark 2.0+ (not tested for earlier version). The library has been tested with Scala 2.11. If you want to use another version, feel free to contact us. In addition to Spark, the library has currently two external dependencies: healpix and spark-fits. Unfortunately, there are no Maven coordinates for healpix, so we release a jar of the latest version within spark3D under the lib/ folder.

Including spark3D in your project

You can link spark3D to your project (either spark-shell or spark-submit) by specifying the coordinates:

toto:~$ spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.3.1" <...>

It might not contain the latest features though (see Building from source). You can check the latest available version at the root of the project (see the maven central badge)

Building from source

If you want to contribute to the project, or have access to the latest features, you can fork and clone the project, and build it from source. This library is easily built with SBT (see the build.sbt script provided). To build a JAR file simply run

toto:~$ sbt ++${SCALA_VERSION} package

from the project root. The build configuration includes support for Scala 2.11. In addition you can build the doc using SBT:

toto:~$ sbt ++${SCALA_VERSION} doc
toto:~$ open target/scala_${SCALA_VERSION}/api/index.html

You can also encapsulate the external dependencies in the spark3D jar by constructing a fat jar:

toto:~$ sbt ++${SCALA_VERSION} assembly

By doing so, you will be able to use spark3D in your program without specifying its external dependencies.

Running the test suite

To launch the test suite, just execute:

toto:~$ sbt ++${SCALA_VERSION} coverage test coverageReport

We also provide a script (test_scala.sh) that you can execute. You should get the result on the screen, plus details of the coverage at target/scala_${SCALA_VERSION}/scoverage-report/index.html.

Using with spark-shell

First produce a jar of the spark3D library, and then launch a spark-shell by specifying the external dependencies:

toto:~$ JARS="target/scala-2.11/spark3d_2.11-0.3.1.jar,lib/jhealpix.jar"
toto:~$ PACKAGES="com.github.astrolabsoftware:spark-fits_2.11:0.7.0"
toto:~$ spark-shell --jars $JARS --packages $PACKAGES

You will be able to import anything from spark3D

scala> import com.astrolabsoftware.spark3d.geometryObjects.Point3D
scala> // etc...

Note that if you make a fat jar (that is building with sbt ++${SCALA_VERSION} assembly and not sbt ++${SCALA_VERSION} package), you do not need to specify external dependencies as they are already included in the resulting jar:

toto:~$ FATJARS="target/scala-2.11/spark3D-assembly-0.3.1.jar"
toto:~$ spark-shell --jars $FATJARS

Using with jupyter notebook and examples

We include a number of notebooks to describe the use of the library in the folder examples/jupyter. We included a README to install Apache Toree as kernel in Jupyter.

Python installation

pyspark3d: A Python interface to spark3D

pyspark3d is directly built around spark3D using py4j. As a result, new features are first added in spark3D and then propagated to pyspark3d (sometimes with delay). The basic idea is to load spark3D objects in the Java Virtual Machine, and then Python programs running in a Python interpreter dynamically access Java objects in the JVM.

Requirements

pyspark3d is tested on Python 3.5 and later. Note: pyspark3d will not run for python 3.4 and earlier (incl. 2.X). The reason is that we make use of type hints:

PEP 484, the typing module, a new standard for type annotations.

In addition, pyspark3d requires:

  • pyspark
  • numpy
  • scipy

You can find all python dependencies in the requirements.txt file at the root of the project. It needs also coverage and coveralls if you want to run the test suite.

Note that pyspark3d depends on spark-fits, but this dependency is included in the assembly JARS (see below).

Including spark3D in your project

In case you are completely lost, have a look at the .travis.yml file at the root of the project. It will give you a hint on how to install the project on ubuntu 16.04.

Using pip

Just run

pip install pyspark3d

Note that we release the assembly JAR with it.

Manual

In order to use pyspark3d, you first need to connect it to spark3D either by specifying the Maven coordinates of the package (+ dependencies) or link to an assembly JAR. For example, download the latest version of spark3D (including pyspark3d), and build an assembly JAR (see also Building from source above):

toto:~$ cd /path/to/spark3D
toto:~$ sbt ++${SCALA_VERSION} assembly

Edit the pyspark3d_conf.py with the newly created JAR, and make sure spark3D is also in your PYTHONPATH:

# in ~/.bash_profile for example
...
export PYTHONPATH=/path/to/spark3D:/path/to/spark3D/pyspark3d:$PYTHONPATH
...

Running the test suite

To launch the test suite, just execute the python script at the root of the package:

toto:~$ ./test_python ${SCALA_VERSION}

We use doctest. Note that it will build the JAR for you, so you need to have Scala and SBT installed on your machine. To printout the coverage report on screen, just modify the script with your environment:

...
isLocal=`whoami`
if [ $isLocal == <put_your_name> ]
then
 ...
fi

Using with pyspark

First produce a FAT JAR of the spark3D library (see above), and then launch a pyspark shell:

toto:~$ PYSPARK_DRIVER_PYTHON=ipython pyspark \
  --jars /path/to/target/scala-2.11/spark3D-assembly-0.3.1.jar

Batch mode and provided examples

You can follow the different tutorials:

We also include Scala examples and runners (run_*.sh) in the folder runners of the repo. You might have to modify those scripts with your environment.