Scala installation
Requirements
This library requires Spark 2.0+ (not tested for earlier version). The
library has been tested with Scala 2.11. If you want to use
another version, feel free to contact us. In addition to Spark, the library has currently two external dependencies: healpix and spark-fits. Unfortunately, there are no Maven coordinates for healpix, so we release a jar of the latest version within spark3D under the lib/
folder.
Including spark3D in your project
You can link spark3D to your project (either spark-shell
or spark-submit
) by specifying the coordinates:
toto:~$ spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.3.1" <...>
It might not contain the latest features though (see Building from source). You can check the latest available version at the root of the project (see the maven central badge)
Building from source
If you want to contribute to the project, or have access to the latest features, you can fork and clone the project, and build it from source.
This library is easily built with SBT (see the build.sbt
script provided). To
build a JAR file simply run
toto:~$ sbt ++${SCALA_VERSION} package
from the project root. The build configuration includes support for Scala 2.11. In addition you can build the doc using SBT:
toto:~$ sbt ++${SCALA_VERSION} doc
toto:~$ open target/scala_${SCALA_VERSION}/api/index.html
You can also encapsulate the external dependencies in the spark3D jar by constructing a fat jar:
toto:~$ sbt ++${SCALA_VERSION} assembly
By doing so, you will be able to use spark3D in your program without specifying its external dependencies.
Running the test suite
To launch the test suite, just execute:
toto:~$ sbt ++${SCALA_VERSION} coverage test coverageReport
We also provide a script (test_scala.sh) that you can execute. You should get the
result on the screen, plus details of the coverage at
target/scala_${SCALA_VERSION}/scoverage-report/index.html
.
Using with spark-shell
First produce a jar of the spark3D library, and then launch a spark-shell by specifying the external dependencies:
toto:~$ JARS="target/scala-2.11/spark3d_2.11-0.3.1.jar,lib/jhealpix.jar"
toto:~$ PACKAGES="com.github.astrolabsoftware:spark-fits_2.11:0.7.0"
toto:~$ spark-shell --jars $JARS --packages $PACKAGES
You will be able to import anything from spark3D
scala> import com.astrolabsoftware.spark3d.geometryObjects.Point3D
scala> // etc...
Note that if you make a fat jar (that is building with sbt ++${SCALA_VERSION} assembly
and not sbt ++${SCALA_VERSION} package
), you do not need to specify external dependencies as they are already included in the resulting jar:
toto:~$ FATJARS="target/scala-2.11/spark3D-assembly-0.3.1.jar"
toto:~$ spark-shell --jars $FATJARS
Using with jupyter notebook and examples
We include a number of notebooks to describe the use of the library in the folder examples/jupyter
. We included a README to install Apache Toree as kernel in Jupyter.
Python installation
pyspark3d: A Python interface to spark3D
pyspark3d is directly built around spark3D using py4j. As a result, new features are first added in spark3D and then propagated to pyspark3d (sometimes with delay). The basic idea is to load spark3D objects in the Java Virtual Machine, and then Python programs running in a Python interpreter dynamically access Java objects in the JVM.
Requirements
pyspark3d is tested on Python 3.5 and later. Note: pyspark3d will not run for python 3.4 and earlier (incl. 2.X). The reason is that we make use of type hints:
PEP 484, the typing module, a new standard for type annotations.
In addition, pyspark3d requires:
pyspark
numpy
scipy
You can find all python dependencies in the requirements.txt
file at the root of the project.
It needs also coverage
and coveralls
if you want to run the test suite.
Note that pyspark3d
depends on spark-fits,
but this dependency is included in the assembly JARS (see below).
Including spark3D in your project
In case you are completely lost, have a look at the .travis.yml
file at the
root of the project. It will give you a hint on how to install the project
on ubuntu 16.04.
Using pip
Just run
pip install pyspark3d
Note that we release the assembly JAR with it.
Manual
In order to use pyspark3d, you first need to connect it to spark3D either by
specifying the Maven coordinates of the package (+ dependencies) or link to
an assembly JAR. For example, download
the latest version of spark3D (including pyspark3d), and build an assembly
JAR (see also Building from source
above):
toto:~$ cd /path/to/spark3D
toto:~$ sbt ++${SCALA_VERSION} assembly
Edit the pyspark3d_conf.py
with the newly created JAR, and
make sure spark3D is also in your PYTHONPATH:
# in ~/.bash_profile for example
...
export PYTHONPATH=/path/to/spark3D:/path/to/spark3D/pyspark3d:$PYTHONPATH
...
Running the test suite
To launch the test suite, just execute the python script at the root of the package:
toto:~$ ./test_python ${SCALA_VERSION}
We use doctest. Note that it will build the JAR for you, so you need to have Scala and SBT installed on your machine. To printout the coverage report on screen, just modify the script with your environment:
...
isLocal=`whoami`
if [ $isLocal == <put_your_name> ]
then
...
fi
Using with pyspark
First produce a FAT JAR of the spark3D library (see above), and then launch a pyspark shell:
toto:~$ PYSPARK_DRIVER_PYTHON=ipython pyspark \
--jars /path/to/target/scala-2.11/spark3D-assembly-0.3.1.jar
Batch mode and provided examples
You can follow the different tutorials:
We also include Scala examples and runners (run_*.sh
) in the folder runners of the repo.
You might have to modify those scripts with your environment.