Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document requirements for custom dissimilarity functions #99

Open
sharma-ji opened this issue Jan 11, 2019 · 2 comments
Open

Document requirements for custom dissimilarity functions #99

sharma-ji opened this issue Jan 11, 2019 · 2 comments
Labels

Comments

@sharma-ji
Copy link

sharma-ji commented Jan 11, 2019

I am trying to implement hamming distance for categorical data, but I am getting an error

C:\Users\Mukul.Sharma\AppData\Local\Continuum\Anaconda3\lib\site-packages\kmodes\kmodes.py in init_huang(X, n_clusters, dissim) 39 # so set centroid to closest point in X. 40 for ik in range(n_clusters): ---> 41 ndx = np.argsort(dissim(X, centroids[ik])) 42 # We want the centroid to be unique, if possible. 43 while np.all(X[ndx[0]] == centroids, axis=1).any() and ndx.shape[0] > 1:

my hamming distance is:
def hamming_distance(s1, s2): """Return the Hamming distance between equal-length sequences""" if len(s1) != len(s2): raise ValueError("Undefined for sequences of unequal length") return sum(el1 != el2 for el1, el2 in zip(s1, s2))

I am having issues with scipy hamming as well (scipy.spatial.distance.hamming)
Here the error says

ValueError: Input vector should be 1-D
Can you please help me ?
Also give me an idea for writing my custom distance metric, like telling me the internal working of this algo (K-prototypes?

@nicodv
Copy link
Owner

nicodv commented Jan 11, 2019

Have a look here for how the other dissimilarity functions work: https://github.com/nicodv/kmodes/blob/master/kmodes/util/tests/test_dissim.py

Looks like you need to adapt your function to accept 2D vectors, whereas right now it assumes 1d vectors.

I should document this better somewhere, so dedicating this ticket to that.

@nicodv nicodv changed the title New custom distance (dissimilarity function) for categorical data. Document requirements for custom dissimilarity functions Jan 11, 2019
@nicodv nicodv added the easy label Jan 11, 2019
@ghost
Copy link

ghost commented Apr 3, 2019

I am trying to implement hamming distance for categorical data, but I am getting an error

C:\Users\Mukul.Sharma\AppData\Local\Continuum\Anaconda3\lib\site-packages\kmodes\kmodes.py in init_huang(X, n_clusters, dissim) 39 # so set centroid to closest point in X. 40 for ik in range(n_clusters): ---> 41 ndx = np.argsort(dissim(X, centroids[ik])) 42 # We want the centroid to be unique, if possible. 43 while np.all(X[ndx[0]] == centroids, axis=1).any() and ndx.shape[0] > 1:

my hamming distance is:
def hamming_distance(s1, s2): """Return the Hamming distance between equal-length sequences""" if len(s1) != len(s2): raise ValueError("Undefined for sequences of unequal length") return sum(el1 != el2 for el1, el2 in zip(s1, s2))

I am having issues with scipy hamming as well (scipy.spatial.distance.hamming)
Here the error says

ValueError: Input vector should be 1-D
Can you please help me ?
Also give me an idea for writing my custom distance metric, like telling me the internal working of this algo (K-prototypes?

Just wanna mention, that the hamming distance is gennerally the same as the "overlap" measure computed in matching_dissim().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants