integer overflow #6

crahal · 2024-08-24T21:43:24Z

Whenever my 'x' is greather than ~2750 organisations, I get this error (on all different models):

Error in if (machine == "localhost") "localhost" else getClusterOption("master",  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In nrow(x) * nrow(y) : NAs produced by integer overflow

Again in windows, R 3.4.

The text was updated successfully, but these errors were encountered:

cjerzak · 2024-08-24T23:40:21Z

What's the dimensionality of 'y' in this case?

crahal · 2024-08-25T10:46:01Z

~700k or so

cjerzak · 2024-08-25T13:36:02Z

There's an expand.grid of 1:2750 against 1:700k, and this is likely causing the overflow. I'll ponder a workaround and run some tests on this case. (So far, we've only tested merges of dimensionality ~100k.) More soon.

crahal · 2024-08-25T14:39:50Z

How detrimental to linkage performance would it be to iterate through chunks of 1k 'x' at a time? Is any of the training holistic, or are all of the linkages one-shot?

cjerzak · 2024-08-26T15:27:51Z

Linkages are one-shot, so iterating through chunks in the way described should give the same results (with one qualification being that the choice of acceptable match threshold might be dynamically set given input data; to disable that, one can set AveMatchNumberPerAlias = NULL and set MaxDist = c for some floating point constant c.

In general, it's hard to know what that c should be but looking at a histogram of distances between matches/non-matched points if available can help.

You might also want to check out ZoomerJoin for a big matching task like this (it's specifically designed for very large merge tasks and computes matches (approximately) using locality sensitive hashing). Ben (of ZoomerJoin) and I are in the process of adding ZoomerJoin capabilities to LinkOrgs, but in the meantime it wouldn't be too hard to output the machine learned representations of the organizational aliases that could then be fed into, e.g., ZoomerJoin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integer overflow #6

integer overflow #6

crahal commented Aug 24, 2024

cjerzak commented Aug 24, 2024

crahal commented Aug 25, 2024

cjerzak commented Aug 25, 2024

crahal commented Aug 25, 2024

cjerzak commented Aug 26, 2024 •

edited

Loading

integer overflow #6

integer overflow #6

Comments

crahal commented Aug 24, 2024

cjerzak commented Aug 24, 2024

crahal commented Aug 25, 2024

cjerzak commented Aug 25, 2024

crahal commented Aug 25, 2024

cjerzak commented Aug 26, 2024 • edited Loading

cjerzak commented Aug 26, 2024 •

edited

Loading