-
Notifications
You must be signed in to change notification settings - Fork 36
Column functions
Jolan Rensen edited this page Aug 2, 2022
·
3 revisions
Similar to the Scala API for Columns, many of the operator functions could be ported over.
For example:
ds.select( col("colA") + 5 )
// datasets can also be invoked to get a column
ds.select( ds("colA") / ds("colB") )
dataset.where( col("colA") `===` 6 )
// or alternatively
dataset.where( col("colA") eq 6)In short, all supported operators are:
-
==- same as
equals()
- same as
-
!=- same as
!equals()
- same as
-
eq/`===`- in Scala:
=== - in Java:
equalTo()
- in Scala:
-
neq/`=!=`- in Scala:
=!= - in Java:
notEqual()
- in Scala:
-
-col(...)- same in Scala
- in Java:
negate(col())
-
!col(...)- same in Scala
- in Java:
not(col())
-
gt- in Scala:
> - same in Java but also infix
- in Scala:
-
lt- in Scala:
< - same in Java but also infix
- in Scala:
-
geq- in Scala:
>= - same in Java but also infix
- in Scala:
-
leq- in Scala:
<= - same in Java but also infix
- in Scala:
-
or- in Scala:
|| - same in Java but also infix
-
`||`is unfortunately an illegal function name on Windows
- in Scala:
-
and/`&&`- in Scala:
&& - in Java:
and()
- in Scala:
-
+- same in Scala
- in Java:
plus()
-
-- same in Scala
- in Java:
minus()
-
*- same in Scala
- in Java:
multiply()
-
/- same in Scala
- in Java:
divide()
-
%- same in Scala
- in Java:
mod()
Secondly, there are some quality of life additions as well:
In Kotlin, Ranges are often
used to solve inclusive/exclusive situations for a range. So, instead of between(a, b) you can now do:
dataset.where( col("colA") inRangeOf 0..2 )Also, for columns containing map- or array-like types, instead of getItem() we have:
dataset.where( col("colB")[0] geq 5 )Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way
to create TypedColumns and with those, a new Dataset from pieces of another using the select() function:
val dataset: Dataset<YourClass> = ...
val newDataset: Dataset<Tuple2<TypeA, TypeB>> = dataset.select(col(YourClass::colA), col(YourClass::colB))
// Alternatively, for instance when working with a Dataset<Row>
val typedDataset: Dataset<Tuple2<String, Int>> = otherDataset.select(col<_, String>("a"), col<_, Int>("b"))