Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define R API functions in separate script #103

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

bmeluch
Copy link
Contributor

@bmeluch bmeluch commented Dec 10, 2024

This PR pulls the R API function definitions out of notebooks and defines them all in a single R script, so we aren't accidentally defining the same function with different tweaks in different notebooks.

It modifies two existing notebooks and adds a script utility_functions.R

(Eventually this may become an R package, but R scripts seem to play nicer than R packages when embedded in a repo.)

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Does your PR link to an issue?
  • Have you described the changes this PR will make?

Notebook Fix Submissions:

  • Does your PR include links to the updated notebook (in the branch) for review using nbviewer, Colab, and reviewnb? These three are the preferred ways to review changes and additions to notebooks during review.

@bmeluch bmeluch linked an issue Dec 10, 2024 that may be closed by this pull request
3 tasks
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bmeluch bmeluch marked this pull request as ready for review January 14, 2025 18:59
@bmeluch bmeluch changed the title Try putting R functions in separate script Define R API functions in separate script Jan 14, 2025
# results as a single dataframe (can be nested). It uses the next_page_token
# key in each page of results to retrieve the following page.

get_next_results <- function(collection, filter_text, max_page_size, fields) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider renaming this function to something like get_all_results because that's actually what it's doing (not just getting the next ones).

# size, and a list of the fields to be retrieved.
# It returns the metadata as a dataframe.

get_first_page_results <- function(collection, filter, max_page_size, fields) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding roxygen annotations for each function that describes the parameters. https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html . This helps with auto completion and is generally the best practice for documenting functions in R.

get_results_by_id <- function(collection, match_id_field, id_list, fields, max_id = 50) {
# collection: the name of the collection to query
# match_id_field: the field in the new collection to match to the id_list
# id_list: a list of ids to filter on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually R lists or are they vectors? I think vectors?

output[[i]] = get_next_results(
collection = collection,
filter = filter,
max_page_size = max_id*3, #assumes that there are no more than 3 records per query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this *3 and associated comment, they aren't true. Consider adding the max_page_size parameter to this function.

@@ -14,85 +14,21 @@
"cell_type": "code",
Copy link
Collaborator

@kheal kheal Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text for part 2 is rendering oddly - see above.


Reply via ReviewNB

@@ -14,85 +14,21 @@
"cell_type": "code",
Copy link
Collaborator

@kheal kheal Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment at re: Step 2


Reply via ReviewNB

@@ -14,85 +14,21 @@
"cell_type": "code",
Copy link
Collaborator

@kheal kheal Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, we reference "omics processing" here - please change to data generation


Reply via ReviewNB

@@ -14,85 +14,21 @@
"cell_type": "code",
Copy link
Collaborator

@kheal kheal Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text rendering oddly in step 7


Reply via ReviewNB

@@ -14,85 +14,21 @@
"cell_type": "code",
Copy link
Collaborator

@kheal kheal Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text rendering oddly in previous chunk


Reply via ReviewNB

@kheal
Copy link
Collaborator

kheal commented Jan 15, 2025

@bmeluch - the sourcing for the functions doesn't work in google colab.

Here's a link to the NOM notebook in this branch in google colab - it'd be good to have this in the PR description because this is something we should be testing during each new notebook. When I try to run the first cell, it errors out with the new sourcing. I'm pretty sure @samobermiller had figured out a workaround for the python functions to make this work, hopefully the same will work for R?

NOM notebook in colab:
https://colab.research.google.com/github/microbiomedata/nmdc_notebooks/blob/90-modularize-r-api-wrapper-functions/NOM_visualizations/R/NOM_R_notebook.ipynb

Taxonomy notebook in colab:
https://colab.research.google.com/github/microbiomedata/nmdc_notebooks/blob/90-modularize-r-api-wrapper-functions/taxonomic_dist_by_soil_layer/R/taxonomic_dist_soil_layer_R.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modularize R API wrapper functions
2 participants