Tutorial: Visualise your partitioning

In many cases, we would like to perform some visual inspection of the partitioning, but displaying large data sets is not a trivial task. The approach here consists in repartitioning the data set, and sampling the data for the display. The colour code corresponds to partition ID.

Before starting

We provide a script based on the Scala client of Plotly. Unfortunately, the releases on Maven are not up-to-date (e.g. they do not contain the code to make Scatter3D plots), hence we compiled the latest sources and released the JAR under $spark3d/lib. You will also find a bash script to help running it:

# in $spark3d
./run_viz_scala.sh

Just adapt the input parameters according to your data and cluster configuration. You will also need to open an account on plotly, and provide your username and api key in the bash script. The plots will then be available in your plotly home.

Some examples

Onion repartitioning

We repartitioned a dataset containing data in spherical coordinates using the onion partitioning scheme. Only a fraction of the total dataset is used for plot:

partitioning-out_srcs_s1_1*.fits-onion-spark3dweb

Octree repartitioning

We repartitioned a dataset containing data in cartesian coordinates using the octree partitioning scheme. Only a fraction of the total dataset is used for plot:

Here is another data set with clusters: