Tutorial: Visualise your partitioning
In many cases, we would like to perform some visual inspection of the partitioning, but displaying large data sets is not a trivial task. The approach here consists in repartitioning the data set, and sampling the data for the display. The colour code corresponds to partition ID.
We provide a script based on the Scala client of Plotly. Unfortunately, the releases on Maven are not up-to-date (e.g. they do not contain the code to make Scatter3D plots), hence we compiled the latest sources and released the JAR under
$spark3d/lib. You will also find a bash script to help running it:
# in $spark3d ./run_viz_scala.sh
Just adapt the input parameters according to your data and cluster configuration. You will also need to open an account on plotly, and provide your username and api key in the bash script. The plots will then be available in your plotly home.
We repartitioned a dataset containing data in spherical coordinates using the
onion partitioning scheme. Only a fraction of the total dataset is used for plot:
We repartitioned a dataset containing data in cartesian coordinates using the
octree partitioning scheme. Only a fraction of the total dataset is used for plot:
Here is another data set with clusters: