-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding suffix tree #323
base: main
Are you sure you want to change the base?
Adding suffix tree #323
Conversation
Codecov Report
@@ Coverage Diff @@
## master #323 +/- ##
=============================================
- Coverage 98.550% 98.369% -0.181%
=============================================
Files 25 26 +1
Lines 3243 3435 +192
=============================================
+ Hits 3196 3379 +183
- Misses 47 56 +9
|
pydatastructs/trees/suffix_tree.py
Outdated
'SuffixTree' | ||
] | ||
|
||
class Suffix_Node(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be in https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/utils/misc_util.py and should inherit Node
class.
pydatastructs/trees/suffix_tree.py
Outdated
else: | ||
return {x for n in self.transition_links.values() for x in n._get_leaves()} | ||
|
||
class SuffixTree(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be in a file, suffix_tree.py
in https://github.com/codezonediitj/pydatastructs/tree/master/pydatastructs/strings
from pydatastructs import SuffixTree | ||
|
||
def test_suffixtree(): | ||
s = SuffixTree("HelloworldHe") | ||
assert s.find("Hel") == 0 | ||
assert s.find_all("He") == {0, 10} | ||
assert s.find("Win") == -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide references for the tests for verifying the results. You may use examples from university lecture notes, wikipedia, books, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still awaited.
There are several uncovered lines in https://codecov.io/gh/codezonediitj/pydatastructs/pull/323/diff (see red). Use steps in https://github.com/codezonediitj/pydatastructs#testing for checking the coverage for your patch locally and add tests accordingly to fully cover your code by tests. |
Thank you for informing me, I haven't noticed these a while |
I have added some test to cover that area, I can't get it to max but given my level best |
@czgdp1807 any more corrections needed!? |
Please add documentation for public methods (the ones which don't start with |
No issue found that |
@czgdp1807 They are requesting the leaderboard update at SWoC, today is the last date |
Done. |
I am a bit busy these days. There might be a delay in reviews. |
Yeah sure take your time |
Thank you 😉 |
pydatastructs/strings/suffix_tree.py
Outdated
i += 1 | ||
return i | ||
|
||
def lcs(self, stringIdxs = -1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using short forms. Use the full name, largest_common_substring
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some methods added to algorithms under this module related with longest common substring I believe. With that backtracking thing. How is this method different from that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backtracking just searches all the strings in the given list by going in reverse direction from the child to root and finds the longest among that but by using this method we give users the freedom to search the string from the index they want. And this function just reduces the comparison time by removing all the non subset of longest sequence
You can continue with your other PRs. I will finalize all of these on Saturday, next week. |
Yeah 👍 |
Yeah It was like literally wow! I have studied it and got some good idea on consistency, Thank you |
Shall I resolve the conflicts out here |
Preferrable resolve the conflicts locally. When you will update your |
No worries I'm using atom which will also do the same. I had the experience of same in my previous PR this time will try to implement it perfectly. |
Have you implemented this algorithm? |
Note this PR will not be counted towards SWoC but it will be counted for GSSoC. This needs some more work both at the design level and at the implementation level. |
Yeah sure then |
I'm back did you find anything to be added to make code better! |
Cool happy to work with you back |
Well, IMHO, the code has a lot of room for improvement. It should be as simple as possible and easy to understand. I would suggest instead of jumping to code let's pick some examples and solve it via fake APIs and keep refining those APIs until a consensus is reached. Then we can proceed with coding and all. This is an important data structure and should be added after a lot of discussions. We are in no hurry. Fault is mine too, I shouldn't have approved the previous designs and stuff for this DS. Let's pick an example to programmatically duplicate some example here. |
Yeah as you say |
Hey @czgdp1807 I think this is the best suffix tree structure I could obtain from the examples. But as for the property side it's still lacking some potential ! Should I work on that rather I make a new PR for Suffix tree properties!? |
We have two options with us, one to keep spending time on Suffix Tree which is a hard data structure. This will delay our first release. Another option is to put this on hold and add it to the next version. I think we should go for the second one since once people will start using our project, contributions will automatically scale. So, let us ignore it for now. Let's keep this PR open and add it to P.S. We should aim to complete the first release by first half of the year that is by August 2021. Let me know what you think. |
I have updated the list of features to be added to |
That's really a nice idea! I will work on other issues as this is time consuming I felt the same way. |
Cool gonna learn something new on graphs |
Here is the updated list, https://github.com/codezonediitj/pydatastructs/wiki/Planned-Features-for-v0.0.1 Not much is to be done for the first release. We will postpone the harder parts for the second half of the year. Otherwise we will never be able to release the package. Just three graph algorithms and one KMP is to be implemented. All of these are easy medium and can be done in one month's time. After that we will be able to focus on building docs and upload it on readthedocs.io |
Suffix Tree
References to other Issues or PRs or Relevant literature
"Fixes #290". See
#290
Brief description of what is fixed or changed
Addition of suffix tree with reference to the pypi project.
Other comments
The code works fine to me, the PR is in count of SWOC and if you find any bug pls ping me.