Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Change Detection #134

Open
vdpappu opened this issue Oct 31, 2019 · 2 comments
Open

Topic Change Detection #134

vdpappu opened this issue Oct 31, 2019 · 2 comments
Labels

Comments

@vdpappu
Copy link
Contributor

vdpappu commented Oct 31, 2019

Let's discuss the potential next steps for improving topic change detection and update the activities here.

@reaganrewop
Copy link
Contributor

reaganrewop commented Oct 31, 2019

The ideal goal for the topic change detection is to, slice the meeting into multiple partition where each partition carries enough information to redeem itself as a discussion.

The following needs to be addressed to achieve this:

  • cosine similarity as solo edge weights.
  • A mixture of topics in a single segment (not an ideal case.)
  • what is the factors for grouping of segments (currently it's the order of the segments by which they were spoken at)
  • pruning of the edges. How do we prune the respective irrelevant edge?
  • filler sentences in the segment causing overlapping groups.

going with our current implementation, I made few extra implementations to try to fix the last issue.

  • handling spillover sentences was rather important because it caused many overlapping groups. I am currently handling this by checking for duplicate segments across the communities and if found, I remove them if majority of the sentences from that segment are placed in a different community.

doing this increased the accuracy of the communities by a large amount and no overlapping groups would be formed.

@reaganrewop
Copy link
Contributor

reaganrewop commented Nov 19, 2019

To improve the current communities approach (the one on staging) or to be precise, to understand what is best for communities, I went through few papers and methods to understand how effective it can be. Based on that I made few changes to the algorithm.

  • Instead of fully connected network, we connect two sentences only if they are either from same segment or from the next. This helps to reduce cosine similarity noise.

  • Normalizing the graph is now a bit different. we compute local normalization score for each node and then for the overlapping edge values, we average the score.

  • community approach relies on self-loops, so that is also added.

  • Based on this paper https://arxiv.org/pdf/0812.1770.pdf , we add another resolution parameter t, which helps to control the stability of the network.

Based on the validation set, the accuracy increased form 47 percent to 79 percent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants