-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to Optionally get site.info
if not present
#3324
base: develop
Are you sure you want to change the base?
Add function to Optionally get site.info
if not present
#3324
Conversation
In do_conversion.R Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
@meetagrawal09 can you cross check if corresponding changes in |
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
I think you're taking the wrong approach to task 2 here: Yes, "if siteID is not provided, construct a unique identifier from lat/lon", but it needs to be constructed without using the DB. As we discussed in Slack, this could be as simple as something like |
Co-authored-by: Chris Black <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting my notes from our live review -- they're a bit cryptic, but hopefully enough to remind you of the key points from the conversation.
Basically I think the entire get.new.site function is trying to handle too many cases, and instead when the db isn't available we should simply paste together lat and lon to use as a siteid.
|
||
if (is.null(site$id) | is.na(site$id)) { | ||
if ((!is.null(site$lat) && !is.null(site$lon)) | | ||
(!is.na(site$lat) && !is.na(site$lon)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird little gotcha here -- the !is.null will fire when lat and lon are NA, and the !is.na will return a missing value when lat and lon are null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want &&
instead of |
base/db/R/get.new.site.R
Outdated
lat = site$lat, | ||
lon = site$lon | ||
) | ||
str_ns <- paste0(new.site$lat, "-", new.site$lon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to use a string other than "-" to concatenate here, to avoid confusion between separator characters and negative lon values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced current logic with a "_" sign
} | ||
} | ||
|
||
site.info <- list(new.site = new.site, str_ns = str_ns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q in live review: why this structure? A: to match what's expected at met.process line 165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed Explanation : Current structure is followed to match data flow of what information is later on utilised in met.process
for other function calls
base/db/R/get.new.site.R
Outdated
generate_new_siteID <- function() { | ||
# Generate a random number. Assuming higher order integers to increase randomness in IDs | ||
random_id <- sample(10000:99999999, 1) | ||
return(as.numeric(random_id)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want IDs to be deterministic -- don't need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest a method to perform so? I am unable to determine one right now :) !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach I suggest is as follow : Hashing to generate a unique id using lat and lon of the site. This would ensure greater precision upto 8 decimal points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented hasing in #3423 thus:
mutate(id = digest::digest(geometry, algo = "xxhash64"))
where 'geometry' could be replaced by paste0(lat, lon)
or, if we wanted to be consistent and convert it to an sf object (I'm not certain I see the point) but could do
data.frame(lat=100, lon = -90) |>
sf::st_as_sf(coords = c("lon", "lat"), crs = 4326) |>
digest::digest(algo="xxhash64")
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon) | ||
|
||
# extract new.site and str_ns from site.info | ||
new.site <- site.info$new.site |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis to confirm before acting: No downstream object uses new.site$id, can remove it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new.site$id
is later on passed on to download.raw.met.module
for further conversions in convert_input
function. I guess for that instance we will have to keep it rather than duplicating the code.
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From previous comments on #3348, I couldn't resist removing unnecessary changes. I will post a discussion regarding this PR and gather insights from other 'members & contributors' too.
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
##' @title Get New Site Info | ||
##' @param site a dataframe with site information including id, lat, lon, and time_zone. | ||
##' @param con Database connection object. | ||
##' @param latlon Optional global latlon object which will store updated lat/lon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following why latlon exists as an option here, as information is never extracted from it in any of the cases below within this function, and none of the calls you implemented to this function pass in anything other than NULL.
lat = latlon$lat, | ||
lon = latlon$lon) | ||
str_ns <- paste0(new.site$id %/% 1e+09, "-", new.site$id %% 1e+09) | ||
latlon <- NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is latlon defined to be null here? Why not just delete this line and the , latlon = latlon
in the function call as you already set the default for latlon to NULL.
lon = latlon$lon) | ||
latlon <- NULL | ||
if(is.null(site$lat) | is.null(site$lon)) { | ||
site.info <- PEcAn.DB::get.new.site(site, con=con, latlon = latlon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point as earlier about latlon not needing to be defined here.
id = as.numeric(site$id), | ||
lat = site$lat, | ||
lon = site$lon | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following why this if/else exists. Doesn't get.new.site already support this else
case? And if not, shouldn't it?
@@ -30,6 +30,7 @@ soil_process <- function(settings, input, dbfiles, overwrite = FALSE,run.local=T | |||
# set up bety connection | |||
con <- PEcAn.DB::db.open(dbparms$bety) | |||
on.exit(PEcAn.DB::db.close(con), add = TRUE) | |||
|
|||
# get site info | |||
latlon <- PEcAn.DB::query.site(site$id, con = con)[c("lat", "lon")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following why this function only has whitespace change as it seems like the code chunk here is doing the same thing as what you were trying to support with your new function, and also would benifit from having a database-independent option.
Description
This PR aims to refactor site.id and improve str_ns. The end goal is to make the the System independent of DB. Currently, I'm refactoring to create a siteID if not present already. I'll add test cases to check this util function too.
Work still in pending
Some comments from @mdietze :
Motivation and Context
May Fix a Subtask of #3307
Review Time Estimate
Types of changes
Checklist: