-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Expose spatial partitioning from SpatialRDD #1751
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also would benefit from a SpatialPartitioner
that removes duplicates (perhaps by wrapping a SpatialPartitioner, consuming the result of placeObject
and deterministically choosing one of the results), since most of the time having duplicates when partitioning is not really desired.
public JavaPairRDD<Integer, T> spatialPartitioningWithIds(GridType gridType, int numPartitions) | ||
throws Exception { | ||
calc_partitioner(gridType, numPartitions); | ||
return spatialPartitioningWithIds(partitioner); | ||
} | ||
|
||
public JavaPairRDD<Integer, T> spatialPartitioningWithIds(final SpatialPartitioner partitioner) { | ||
this.partitioner = partitioner; | ||
return this.rawSpatialRDD | ||
.flatMapToPair( | ||
new PairFlatMapFunction<T, Integer, T>() { | ||
@Override | ||
public Iterator<Tuple2<Integer, T>> call(T spatialObject) throws Exception { | ||
return partitioner.placeObject(spatialObject); | ||
} | ||
}) | ||
.partitionBy(partitioner); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not needed (the changes in the Adapter
to preserve the partitioning of spatialPartitionedRDD
into the output data frame should eliminate the need to keep any identifier alongside the partition).
val stringRow = extractUserData(geom) | ||
castRowToSchema(stringRow = stringRow, schema = schema) | ||
}) | ||
val rdd = spatialRDD.rawSpatialRDD.rdd.mapPartitions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But moved to a different overload since most of the time this will introduce duplicates)
val rdd = spatialRDD.rawSpatialRDD.rdd.mapPartitions( | |
val rdd = spatialRDD.spatialPartitionedRDD.rdd.mapPartitions( |
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-695] my subject
.Closes #1268.
What changes were proposed in this PR?
This PR exposes spatial partitioning information from the SpatialRDD API. Sedona is exceptionally good at this and the spatial community would love to have access to this information!
There are two pieces of information that would be helpful:
There are a few ideas in this PR...the boundaries seem straightforward but I'm a little new to the RDD API to know what the options are for returning these things.
How was this patch tested?
Working on it!
Did this PR include necessary documentation updates?
vX.Y.Z
format.