Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding Panther's usage #2

Open
DonaldTsang opened this issue Feb 11, 2020 · 4 comments
Open

Questions regarding Panther's usage #2

DonaldTsang opened this issue Feb 11, 2020 · 4 comments

Comments

@DonaldTsang
Copy link

  1. What is the output file of Panther? We know the input to be a simple CSV.
  2. How many iterations does Panther uses for their system? It is not specified in the paper?
  3. Would increasing the iteration count of Panther make it more accurate?
  4. Is it possible to use this as a way of clustering user/vertex/node Roles?
@yuikns
Copy link
Owner

yuikns commented Jul 30, 2020

Apologize for the delayed response.

  1. The output was in $current_folder/result/<data, e.g. sampe>_T_D_<epsilon * 1000000, (to avoid period)>.pathsim. It is a record file. Please see the sample output below.
  2. We perform R random walks, which may the "number of iterations" in the issue. This is a calculated value.
  3. We provide theoretical proofs for the error-bound and confidence of the proposed algorithm. In general, the path similarity can be viewed as a probability of measure defined over all paths . Thus we can adopt the results from Vapnik-Chernovenkis (VC) learning theory to analyze the proposed sampling-based algorithm. Theoretically, we obtain that the sample size. Increasing the iteration count may not harvest much more accuracy, but highly harm the performance.
  4. It was trying to figure out D similar nodes based on the ego network. It could be a good way to provide some key informations for clustering. Actually this was part of our work in AMiner's author clustering.

Sample output:

0:0.000001629734520 505748:0.000000992012317 1:0.000000708580226 157140:0.000000637722204 264947:0.000000637722204 60303:0.000000566864181 956126:0.000000212574068 1098494:0.000000141716045 498405:0.000000141716045 43204:0.000000070858023 106492:0.000000070858023 265288:0.000000070858023 120664:0.000000070858023 129150:0.000000070858023 98430:0.000000070858023 36690:0.000000070858023 98429:0.000000070858023 206687:0.000000070858023 178984:0.000000070858023 81329:0.000000070858023 228866:0.000000070858023 258156:0.000000070858023 32979:0.000000070858023 175271:0.000000070858023 271976:0.000000070858023 338035:0.000000070858023 367442:0.000000070858023 375027:0.000000070858023 447600:0.000000070858023 456807:0.000000070858023 769223:0.000000070858023 172276:0.000000070858023 593231:0.000000070858023 572214:0.000000070858023 157139:0.000000070858023 863611:0.000000070858023 919730:0.000000070858023 45094:0.000000070858023 1034115:0.000000070858023 1075850:0.000000070858023 26354:0.000000070858023 1139662:0.000000070858023 1488646:0.000000070858023 1539641:0.000000070858023 26353:0.000000070858023

The above record was in the first line (id 0), and the D similar records are: 0, 505748, 1, 157140....

@DonaldTsang
Copy link
Author

Would it be possible to provide pointers to Rolde Detection, Role Clustering, or Role Discovery? I would like to focus more on that front since that is what RoleSim is for.

@yuikns
Copy link
Owner

yuikns commented Jul 31, 2020

I am afraid this is not the problem that Panther was supposed to resolve.

@DonaldTsang
Copy link
Author

If that is the case then do you know of any RoleSim replacements or alternatives? Would Vertex Similarity be similar to Link Prediction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants