Tutorial: Query, cross-match, play!

The spark3D library contains a number of methods and tools to manipulate 3D RDD. Currently, you can already play with window query, KNN and cross-match between data sets.

Standard KNN

Finds the K nearest neighbours of an object within a DataFrame. The naive implementation here searches through all the the objects in the DataFrame to get the KNN. The nearness of the objects here is decided on the basis of the distance between their centers.

import com.astrolabsoftware.spark3d.Queries.KNN

// Let's suppose we have a DataFrame containing at least
// 3 columns x, y, z corresponding to cartesian coordinate of points.
val df = spark.read
  .format("csv")
  .option("header", true)
  .option("inferSchema", true)
  .load("src/test/resources/cartesian_spheres_manual.csv")

df.show()
  +---+---+---+------+
  |  x|  y|  z|radius|
  +---+---+---+------+
  |1.0|1.0|1.0|   0.8|
  |1.0|1.0|1.0|   0.5|
  |3.0|1.0|1.0|   0.8|
  |3.0|1.0|1.0|   0.5|
  |3.0|3.0|1.0|   0.8|
  |3.0|3.0|1.0|   0.5|
  |1.0|3.0|1.0|   0.8|
  |1.0|3.0|1.0|   0.5|
  |1.0|1.0|3.0|   0.8|
  |1.0|1.0|3.0|   0.5|
  |3.0|1.0|3.0|   0.8|
  |3.0|1.0|3.0|   0.5|
  |3.0|3.0|3.0|   0.8|
  |3.0|3.0|3.0|   0.5|
  |1.0|3.0|3.0|   0.8|
  |2.0|2.0|2.0|   2.0|
  +---+---+---+------+

// Define your target for which you want the neighbour points
val coordSys : String = "cartesian"
val target : List[Double] = List(0.2, 0.2, 0.2)

// Number of neighbours
val K : Int = 2

// Return a DataFrame
val dfNeighbour = KNN(df.select("x", "y", "z"), target, K, coordSys, unique = false)
dfNeighbour.show()
  +---+---+---+
  |  x|  y|  z|
  +---+---+---+
  |1.0|1.0|1.0|
  |1.0|1.0|1.0|
  +---+---+---+

// In case you want also the metadata, you can perform a join with initial DataFrame:
val dfNeighbourWithMeta = df.join(dfNeighbour, dfNeighbour.columns, "left_semi")
dfNeighbourWithMeta.show()
  +---+---+---+------+
  |  x|  y|  z|radius|
  +---+---+---+------+
  |1.0|1.0|1.0|   0.8|
  |1.0|1.0|1.0|   0.5|
  +---+---+---+------+

Note that by default (unique = false), the method KNN will return exactly K neighbours regardless their values. But there could be neighbours at the same location. If you want exactly K distinct neighbours, then choose unique = true.

Accelerated KNN

To come

Window Query

A window query takes as input a DataFrame and an envelope, and returns all objects in the RDD intersecting the envelope (contained in and crossing the envelope):

import com.astrolabsoftware.spark3d.Queries.windowQuery

// Let's suppose we have a DataFrame containing at least
// 3 columns x, y, z corresponding to cartesian coordinate of points.
val df = spark.read
  .format("csv")
  .option("header", true)
  .option("inferSchema", true)
  .load("src/test/resources/cartesian_spheres_manual.csv")

df.show()
  +---+---+---+------+
  |  x|  y|  z|radius|
  +---+---+---+------+
  |1.0|1.0|1.0|   0.8|
  |1.0|1.0|1.0|   0.5|
  |3.0|1.0|1.0|   0.8|
  |3.0|1.0|1.0|   0.5|
  |3.0|3.0|1.0|   0.8|
  |3.0|3.0|1.0|   0.5|
  |1.0|3.0|1.0|   0.8|
  |1.0|3.0|1.0|   0.5|
  |1.0|1.0|3.0|   0.8|
  |1.0|1.0|3.0|   0.5|
  |3.0|1.0|3.0|   0.8|
  |3.0|1.0|3.0|   0.5|
  |3.0|3.0|3.0|   0.8|
  |3.0|3.0|3.0|   0.5|
  |1.0|3.0|3.0|   0.8|
  |2.0|2.0|2.0|   2.0|
  +---+---+---+------+

// Define your envelope - here we take a sphere
// centred on (2.0, 1.0, 2.0) with radius 1.8
val windowtype = "sphere"
val windowcoord = List(2.0, 1.0, 2.0, 1.8)
val coordSys = "cartesian"

// Return a DataFrame
val env = windowQuery(df.select("x", "y", "z"), windowtype, windowcoord, coordSys)
env.show()
  +---+---+---+
  |  x|  y|  z|
  +---+---+---+
  |1.0|1.0|1.0|
  |1.0|1.0|1.0|
  |3.0|1.0|1.0|
  |3.0|1.0|1.0|
  |1.0|1.0|3.0|
  |1.0|1.0|3.0|
  |3.0|1.0|3.0|
  |3.0|1.0|3.0|
  |2.0|2.0|2.0|
  +---+---+---+
// In case you want also the metadata, you can perform a join with initial DataFrame:
envWithMeta = df.join(env, env.columns, "left_semi")
envWithMeta.show()
  +---+---+---+------+
  |  x|  y|  z|radius|
  +---+---+---+------+
  |3.0|1.0|1.0|   0.8|
  |3.0|1.0|1.0|   0.5|
  |3.0|1.0|3.0|   0.8|
  |3.0|1.0|3.0|   0.5|
  |2.0|2.0|2.0|   2.0|
  |1.0|1.0|1.0|   0.8|
  |1.0|1.0|1.0|   0.5|
  |1.0|1.0|3.0|   0.8|
  |1.0|1.0|3.0|   0.5|
  +---+---+---+------+

Note that the envelope can be anything among the Shape3D: point, shell, sphere, box. Depending on the windowtype, you would express windowcoord as:

  • point: windowcoord = List(x, y, z)
  • sphere: windowcoord = List(x, y, z, R)
  • shell: windowcoord = List(x, y, z, Rin, Rout)
  • box: windowcoord = List(x1, y1, z1, x2, y2, z2, x3, y3, z3)

Use (x, y, z) for cartesian or (r, theta, phi) for spherical. Note that box only accepts cartesian coordinates.

Cross-match

To come