Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

June 21, 2024 Improvements #3

Open
cjerzak opened this issue Jun 21, 2024 · 2 comments
Open

June 21, 2024 Improvements #3

cjerzak opened this issue Jun 21, 2024 · 2 comments

Comments

@cjerzak
Copy link
Owner

cjerzak commented Jun 21, 2024

  1. Build Hugging Face model, add supplementary functions to the package as needed.
  2. Consider revision to the AverageMatches process.
@cjerzak
Copy link
Owner Author

cjerzak commented Jul 3, 2024

Revision idea to AverageMatches -> sample 100 rows of x, compare with all of y; sample 100 rows of y, compare with all of x

@beniaminogreen
Copy link
Collaborator

Hi Connor, here's a quick test function that I made based on the example we talked over today. It takes an input function as an argument (in this case, GetCalibratedDistThresh), and checks whether it can calculate an appropriate threshold to match a dataset to an infinitesimally-shifted copy of itself. Happy to revise this if we think we need a less-stringent test, or if we want to add more debug information to this function.

test_calibrate_threshold <- function(threshold_picking_function, n = 1000, p=250) {
    x <- matrix(rnorm(n*p),n,p)
    y <- x + matrix(rnorm(n*p,0,.0001),n,p)

    threshold <- threshold_picking_function(x=x,y=y,AveMatchNumberPerAlias=1)
    stopifnot(threshold < .5)
}

# Example
test_calibrate_threshold(GetCalibratedDistThres)

Best,
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants