Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup duplicate rule detection #1068

Closed

Conversation

Antiarchitect
Copy link
Contributor

@Antiarchitect Antiarchitect commented Aug 22, 2022

It seems I've simplified and speed up duplicate_rule? function. May be related to #1053

@Antiarchitect Antiarchitect requested a review from a team as a code owner August 22, 2022 15:50
@Antiarchitect Antiarchitect changed the title Simplify duplicate rule detection Speedup duplicate rule detection Aug 22, 2022
@chelnak
Copy link
Contributor

chelnak commented Aug 22, 2022

Hey @Antiarchitect, thanks for this.

Out of interest, have you tested against the scenario mentioned in the issue?

@Antiarchitect Antiarchitect force-pushed the speedup-find-duplicates branch from 6bbe2f7 to b057000 Compare August 22, 2022 15:55
@Antiarchitect
Copy link
Contributor Author

Will test it in a day or two - we're facing the same issue when number of rules are hundreds

@Antiarchitect
Copy link
Contributor Author

Antiarchitect commented Aug 22, 2022

I will think about another approach to store a hash with key => name value => bool for rules. This will reduce the search complexity from O(n) in the worst case to O(1) in any.

@Antiarchitect
Copy link
Contributor Author

Antiarchitect commented Aug 23, 2022

We've tested duplicate_rule? and it is called on every exists? just to throw some warnings for each rule. So the complexity is O(n^2) because it's calling self.class.instances which parses iptables-save every time. We have about 3000 rules on the machine and whole process takes very long. For the test I just replaced duplicate_rule? with

def duplicate_rule?(rule)
    false
end

and it gets back to normal. Please do something. It seems we need global hash with O(1) check of a duplicate. Or memorizing self.instances result internally somehow.

@chelnak
Copy link
Contributor

chelnak commented Aug 23, 2022

@Antiarchitect We appreciate you going in to such detail with this.

From memory this was a trick one.. due to the requirements around the change O(1) may be hard to achieve.. but i'm certain we can make it faster!

I'll bring it up with the team and get some eyes on it.

@stefanlasiewski
Copy link

Thanks for looking @chelnak. This is an important fix for us as well.

@github-actions
Copy link

Hello! 👋

This pull request has been open for a while and has had no recent activity. We've labelled it with attention-needed so that we can get a clear view of which PRs need our attention.

If you are waiting on a response from us we will try and address your comments on a future Community Day.

Alternatively, if it is no longer relevant to you please close the PR with a comment.

Please note that if a pull request receives no update for 7 after it has been labelled, it will be closed. We are always happy to re-open pull request if they have been closed in error.

@LukasAud
Copy link
Contributor

Hi @Antiarchitect, sorry for the long delay in feedback. In order to keep better track of this issue and avoid our PR page cluttering, we would like for this topic to be moved into our "Issues" page and, if possible, linked onto our current ongoing discussion about the Firewall module re-architecture project.

@LukasAud
Copy link
Contributor

Closing stale PR. Recommended action has already been stated above.

@LukasAud LukasAud closed this Feb 17, 2023
@Antiarchitect Antiarchitect deleted the speedup-find-duplicates branch February 17, 2023 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants