Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Change the default value of similarityCutoff parameter #730

Open
wants to merge 2 commits into
base: 3.4
Choose a base branch
from

Conversation

tomasonjo
Copy link
Collaborator

Not sure if there is a deeper reason for similarityCutoff to have to be greater than 0.0 to allow writeback. Fixes #729

@tomasonjo tomasonjo changed the title Fix the write boolean for similarity index Fix the write boolean for jaccard similarity index Oct 2, 2018
@tomasonjo
Copy link
Collaborator Author

I checked other similarities and they all have similarityCutoff to have to be greater than 0.0, but the default is -1, so they don't writeback by default.

We either have to change the default value of similarityCutoff or remove this condition when considering if we should writeback or not on others aswell.

I would just remove the condition.

@mneedham
Copy link
Contributor

mneedham commented Oct 3, 2018

The extra condition is deliberate as we thought it doesn't make sense to write a SIMILAR relationship between two nodes if they aren't actually similar in any way.

I think we should change the default to be -1 for all of them, but then update the documentation so that it's clearer what you have to do to write relationships.

I'm not really opposed to what you're suggesting either though so I guess @jexp can be the tie breaker!

@tomasonjo
Copy link
Collaborator Author

tomasonjo commented Oct 3, 2018

That would make sense, but the rule should be imposed one layer down. As is right now no relationships get written back even if they have similarity greater than 0 if we don't define similarityCutoff. So that rels with similarity greater than 0 still get written back.

And this only plays role in the context of cosine similarity as it is the only one that can have negative values, where jaccard and euclidian are always greater than or equal to 0 anyway.

@jexp
Copy link
Member

jexp commented Oct 3, 2018

I'm not sure 100% sure what you mean.

We should never write relationships for dissimilar pairs as it just explodes with n^2.
I want the defaults to be safe, i.e. if you don't specify something it should not hurt you.

The rule is different depending on the type of similarity.
So we probably only have to fix the default for cosine or make it by similarity != default.

@tomasonjo
Copy link
Collaborator Author

tomasonjo commented Oct 3, 2018

What I would suggest then is the following:

We got four similarities with possible values:

  • cosine: -1 to 1
  • jaccard: 0 to 1
  • overlap: 0 to 1
  • euclidian: 0 to ∞

If we keep it simple and just use one default similarityCutoff I would use 0.5 as cosine can still blow up with 0.0 still and we cover jaccard and overlap as they can blow up too.

On the other hand it is very hard to set an arbitrate default number for euclidian, so maybe we just stick with 0.5.

We can still include the boolean condition for cosine writeback to be greater than 0.0 and never allow user to writeback dissimilar pairs, but need to include it in the documentation then.

WDYT?

@tomasonjo
Copy link
Collaborator Author

Much cleaner option might be setting default topK parameter at 20 and not set any default similarityCutoff. This handles the default version very nicely IMO

@neo4j-oss-build
Copy link
Collaborator

neo4j-oss-build commented Oct 5, 2018 via email

@tomasonjo tomasonjo changed the title Fix the write boolean for jaccard similarity index Change the default value of similarityCutoff parameter Oct 7, 2018
@tomasonjo
Copy link
Collaborator Author

I changed the default similarityCutoff to 0.1. This implies to all except for euclidian distance where I left the default value -1.0.

@mneedham
Copy link
Contributor

mneedham commented Mar 25, 2019

I think what we'll do is throw an error if you try and set write: true with a similarity of <= 0.0. I'll make a PR to do that and cc you on it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants