Could you tell me why you use the L2 distance rather than the Cosine Similarity proposed in the paper?