switch logistic classifier to random forest as default classifier #1031

fgregg · 2022-06-02T03:36:07Z

would close #990

coveralls · 2022-06-02T03:54:25Z

Coverage remained the same at 64.947% when pulling 574aa84 on rf into 6fc8724 on main.

fgregg · 2022-06-02T04:10:09Z

github-actions · 2022-06-02T04:38:20Z

All benchmarks (diff):

	before	after	ratio	benchmark
	528M	532M	1.01	canonical.Canonical.peakmem_run
!	17.4±0.3s	failed	n/a	canonical.Canonical.time_run
-	0.919	0.767	0.83	canonical.Canonical.track_precision
+	0.902	1.0	1.11	canonical.Canonical.track_recall
	228M	232M	1.02	canonical_gazetteer.Gazetteer.peakmem_run(None)
!	15.7±0.2s	failed	n/a	canonical_gazetteer.Gazetteer.time_run(None)
	0.982	0.991	1.01	canonical_gazetteer.Gazetteer.track_precision(None)
	0.982	0.982	1.00	canonical_gazetteer.Gazetteer.track_recall(None)
	228M	232M	1.02	canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
	228M	232M	1.02	canonical_matching.Matching.peakmem_run({'threshold': 0.5})
!	14.3±0.07s	failed	n/a	canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
!	14.5±0.05s	failed	n/a	canonical_matching.Matching.time_run({'threshold': 0.5})
	0.981	0.982	1.00	canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.99	1.0	1.01	canonical_matching.Matching.track_precision({'threshold': 0.5})
	0.911	0.991	1.09	canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.911	1.0	1.10	canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

fgregg · 2022-06-02T04:43:03Z

@benchmark

github-actions · 2022-06-02T05:08:17Z

All benchmarks (diff):

	before	after	ratio	benchmark
	527M	532M	1.01	canonical.Canonical.peakmem_run
!	16.9±0.2s	failed	n/a	canonical.Canonical.time_run
-	0.919	0.824	0.90	canonical.Canonical.track_precision
+	0.902	1.0	1.11	canonical.Canonical.track_recall
	228M	232M	1.02	canonical_gazetteer.Gazetteer.peakmem_run(None)
!	16.0±0.03s	failed	n/a	canonical_gazetteer.Gazetteer.time_run(None)
	0.982	0.991	1.01	canonical_gazetteer.Gazetteer.track_precision(None)
	0.982	1.0	1.02	canonical_gazetteer.Gazetteer.track_recall(None)
	227M	235M	1.03	canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
	227M	235M	1.03	canonical_matching.Matching.peakmem_run({'threshold': 0.5})
	14.5±0.05s	29.5±0.3s	~2.03	canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
	14.1±0.1s	28.9±0.2s	~2.05	canonical_matching.Matching.time_run({'threshold': 0.5})
	0.99	0.991	1.00	canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.99	1.0	1.01	canonical_matching.Matching.track_precision({'threshold': 0.5})
	0.911	0.991	1.09	canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.911	1.0	1.10	canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

fgregg · 2022-06-02T10:29:58Z

@benchmark

github-actions · 2022-06-02T10:50:51Z

All benchmarks (diff):

	before	after	ratio	benchmark
	528M	532M	1.01	canonical.Canonical.peakmem_run
+	15.6±0.03s	24.4±0s	1.56	canonical.Canonical.time_run
	0.87	0.83	0.95	canonical.Canonical.track_precision
	0.955	1.0	1.05	canonical.Canonical.track_recall
	227M	232M	1.02	canonical_gazetteer.Gazetteer.peakmem_run(None)
+	13.2±0.01s	23.7±0s	1.79	canonical_gazetteer.Gazetteer.time_run(None)
	0.982	1.0	1.02	canonical_gazetteer.Gazetteer.track_precision(None)
	0.982	1.0	1.02	canonical_gazetteer.Gazetteer.track_recall(None)
	227M	234M	1.03	canonical_matching.Matching.peakmem_run({'threshold': 0.5, 'constraint': 'many-to-one'})
	227M	234M	1.03	canonical_matching.Matching.peakmem_run({'threshold': 0.5})
	11.7±0.08s	22.8±0.2s	~1.94	canonical_matching.Matching.time_run({'threshold': 0.5, 'constraint': 'many-to-one'})
	11.3±0.3s	23.0±0.2s	~2.03	canonical_matching.Matching.time_run({'threshold': 0.5})
	0.99	1.0	1.01	canonical_matching.Matching.track_precision({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.99	1.0	1.01	canonical_matching.Matching.track_precision({'threshold': 0.5})
	0.911	1.0	1.10	canonical_matching.Matching.track_recall({'threshold': 0.5, 'constraint': 'many-to-one'})
	0.911	1.0	1.10	canonical_matching.Matching.track_recall({'threshold': 0.5})

(logs)

fgregg · 2022-06-02T15:42:15Z

@NickCrews, there's a lot of noise in the recall/precision differences. Should we increase the repetitions?

NickCrews · 2022-06-02T21:12:43Z

You know you can run these benchmarks locally, right? That might help with any sort of debugging that you need to do. Look at the workflow to see what to do, I can help if needed.

Seems easy enough to try increasing the reps and seeing if the metrics become more stable. Play around locally and see how many you need to do to get stability? I think this is done by playing with https://asv.readthedocs.io/en/stable/benchmarks.html#benchmark-attributes

It seems inherent in the fact that we are using non-deterministic algorithms that we are seeing this variation, so I don't see it as a problem with our testing methodology. It could be considered a problem with the actual implementation though: If we get such variation between runs, then should dedupe actually do a bunch of trials and then choose the settings from the best run? This smells of overfitting to me?

Somewhat related, but a nice-to-have would be if we could pass in random_state into all the classes and functions to make them deterministic, like sklearn etc all do. Doing this would sort of make things better for this problem: It would make two different benchmark runs more comparable. BUT, if we happen to choose a random_seed that isn't representative, then these benchmarks won't be very helpful for predicting real life (maybe I'm thinking about this wrong).

Anyway, I think yes increasing reps for benchmarks would be the best bet.

fgregg added 2 commits June 1, 2022 23:35

random forest

6fab426

merge conflict

ff4587c

Merge branch 'main' into rf

c84a0c8

dedupeio deleted a comment from github-actions bot Jun 2, 2022

fewer elements in grid search

574aa84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch logistic classifier to random forest as default classifier #1031

switch logistic classifier to random forest as default classifier #1031

fgregg commented Jun 2, 2022

coveralls commented Jun 2, 2022 •

edited

Loading

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

fgregg commented Jun 2, 2022

NickCrews commented Jun 2, 2022

switch logistic classifier to random forest as default classifier #1031

Are you sure you want to change the base?

switch logistic classifier to random forest as default classifier #1031

Conversation

fgregg commented Jun 2, 2022

coveralls commented Jun 2, 2022 • edited Loading

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

fgregg commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

All benchmarks (diff):

fgregg commented Jun 2, 2022

NickCrews commented Jun 2, 2022

coveralls commented Jun 2, 2022 •

edited

Loading