You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to implement hamming distance for categorical data, but I am getting an error
C:\Users\Mukul.Sharma\AppData\Local\Continuum\Anaconda3\lib\site-packages\kmodes\kmodes.py in init_huang(X, n_clusters, dissim) 39 # so set centroid to closest point in X. 40 for ik in range(n_clusters): ---> 41 ndx = np.argsort(dissim(X, centroids[ik])) 42 # We want the centroid to be unique, if possible. 43 while np.all(X[ndx[0]] == centroids, axis=1).any() and ndx.shape[0] > 1:
my hamming distance is: def hamming_distance(s1, s2): """Return the Hamming distance between equal-length sequences""" if len(s1) != len(s2): raise ValueError("Undefined for sequences of unequal length") return sum(el1 != el2 for el1, el2 in zip(s1, s2))
I am having issues with scipy hamming as well (scipy.spatial.distance.hamming)
Here the error says
ValueError: Input vector should be 1-D
Can you please help me ?
Also give me an idea for writing my custom distance metric, like telling me the internal working of this algo (K-prototypes?
The text was updated successfully, but these errors were encountered:
Looks like you need to adapt your function to accept 2D vectors, whereas right now it assumes 1d vectors.
I should document this better somewhere, so dedicating this ticket to that.
nicodv
changed the title
New custom distance (dissimilarity function) for categorical data.
Document requirements for custom dissimilarity functions
Jan 11, 2019
I am trying to implement hamming distance for categorical data, but I am getting an error
C:\Users\Mukul.Sharma\AppData\Local\Continuum\Anaconda3\lib\site-packages\kmodes\kmodes.py in init_huang(X, n_clusters, dissim) 39 # so set centroid to closest point in X. 40 for ik in range(n_clusters): ---> 41 ndx = np.argsort(dissim(X, centroids[ik])) 42 # We want the centroid to be unique, if possible. 43 while np.all(X[ndx[0]] == centroids, axis=1).any() and ndx.shape[0] > 1:
my hamming distance is: def hamming_distance(s1, s2): """Return the Hamming distance between equal-length sequences""" if len(s1) != len(s2): raise ValueError("Undefined for sequences of unequal length") return sum(el1 != el2 for el1, el2 in zip(s1, s2))
I am having issues with scipy hamming as well (scipy.spatial.distance.hamming)
Here the error says
ValueError: Input vector should be 1-D
Can you please help me ?
Also give me an idea for writing my custom distance metric, like telling me the internal working of this algo (K-prototypes?
Just wanna mention, that the hamming distance is gennerally the same as the "overlap" measure computed in matching_dissim().
I am trying to implement hamming distance for categorical data, but I am getting an error
C:\Users\Mukul.Sharma\AppData\Local\Continuum\Anaconda3\lib\site-packages\kmodes\kmodes.py in init_huang(X, n_clusters, dissim) 39 # so set centroid to closest point in X. 40 for ik in range(n_clusters): ---> 41 ndx = np.argsort(dissim(X, centroids[ik])) 42 # We want the centroid to be unique, if possible. 43 while np.all(X[ndx[0]] == centroids, axis=1).any() and ndx.shape[0] > 1:
my hamming distance is:
def hamming_distance(s1, s2): """Return the Hamming distance between equal-length sequences""" if len(s1) != len(s2): raise ValueError("Undefined for sequences of unequal length") return sum(el1 != el2 for el1, el2 in zip(s1, s2))
I am having issues with scipy hamming as well (scipy.spatial.distance.hamming)
Here the error says
ValueError: Input vector should be 1-D
Can you please help me ?
Also give me an idea for writing my custom distance metric, like telling me the internal working of this algo (K-prototypes?
The text was updated successfully, but these errors were encountered: