-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why dim=2 here? I guess the softmax is taken over all capsules in the layer above, so shouldn't that be dim=0? #21
Comments
I had the same question. the dimension of logits is [10, 100, 1152, 1, 16]. since the softmax is taken over dim=2, the softmax works on 'number of primary capsules = 1152'. The explanation seems to me as below: Every digit capsule (=10) should select which of the 1152 primary capsules to be accepted for its decision. In simple words, each of the 1152 lower level capsules will be fighting to send its output to 10 digit capsules. Hence, the softmax seems to be on dim=2. |
I have the same question and I read some other implementation like Tensorflow and PyTorch for CapsuleNet and I think that softmax logits [10, 100, 1152, 1, 16] should apply to dim 0. probs = softmax ( logits, dim=0 ) as the original paper presents. |
I think, below are the explanations for softmax along dim=0/dim=2:
Since 1 & 2 gives the same performance (more or less), I am not sure how to reason it. |
@InnovArul Thanks for your reply! In my opinion, if we want to follow the original paper, we should set the dim equal 0. And as you say that with the dim equal 2, the model can achieve similar performance with the original one. I think this may be caused by the equivalent effect that routing the weights based on PrimayCaps or DigCaps. Both of these ways can achieve capsule transformation. |
capsule-networks/capsule_network.py
Line 69 in 1a4edd2
The text was updated successfully, but these errors were encountered: