Updated key rotation to be more random #2504

sgoggins · 2023-08-28T05:24:38Z

Description

Use of API keys was not being rotated, and so jobs were getting stuck.
This PR fixes the glitch.
This PR now also fixes an old issue with how the commit_count was retrieved by the repo_info task. Previously we selected a main branch. The new code gets the default branch. I have verified that where commits were getting stored in the metadata as NULL in ~25% of cases, and there are 5-10% of repos in any augur collection where main is not the default branch. Using the default branch always get commit_count data correctly. NO more NULL values. No more counting the "not default branch".
This PR now also fixes the refreshability of materialized views by adding unique keys on all but one, and enabling concurrent refreshes on all but one. It also removes 3 libyear related materialized views that need to be refactored.

ABrain7710

I'm unsure there is an issue with Github API keys being assigned randomly. I say this because the current implementation is as simple as it gets. We store a list of keys and assign a random key to every request we make. So unless the random.choice function isn't working, I don't think this is the issue.

ABrain7710 · 2023-08-29T19:40:31Z

augur/tasks/util/redis_list.py

+            # This will get a random index from the list and remove it, 
+            # decreasing the likelihood of everyone using the same key all the time
+            #redis.rpop(self.redis_list_key)
+            redis.spop(self.redis_list_key)



This won't do anything because this isn't how we are getting the api keys from redis.

spop does a randomized retrieval, while rpop does it in the order. The issue with using the order is that we will never know and don't want to try to know how big each job is before we do it. This balances out our key counts. I proved this by changing it and running it and seeing a HUGE gain in collection rate.

ABrain7710 · 2023-08-29T19:43:29Z

augur/tasks/github/util/github_random_key_auth.py

@@ -16,6 +17,7 @@ def __init__(self, session: DatabaseSession, logger):

        # gets the github api keys from the database via the GithubApiKeyHandler
        github_api_keys = GithubApiKeyHandler(session).keys
+        github_api_keys = random.sample(github_api_keys, len(github_api_keys))



I believe the goal here is to randomize the order. If so random.shuffle should be used since Random.sample is used to get a random subset of a list. This will randomly order the list but it is less clear what the goal is

In short, we get more durable randomness from random.sample than random.shuffle ... random.shuffle reorders the original list, but it does this in memory, it doesn't reorder what is stored in redis. random.sample does a "better" job of getting us novel keys from a list every time. I tested both, and was reusing the same key about 3x as often with shuffle compared to random.

From Here: https://blog.enterprisedna.co/python-shuffle-list/

It’s essential to note that the shuffle() function returns None and modifies the original list or array. Therefore, it’s unsuitable for cases where you must maintain the original list order.

To return a new list containing elements from the original list without modifying it, you can use the sample() function from the random library:

ABrain7710 · 2023-08-29T19:48:04Z

augur/tasks/github/util/github_api_key_handler.py

+            self.logger.debug(f'valid keys AFTER shuffle: {valid_keys}')
+        except Exception as e: 
+            self.logger.debug(f'{e}')
+            valid_keys = valid_now


This only gets executed the first time the GithubApiKeyHandler is created after that the keys will be stored in Redis so this function will return in the if redis_keys block.

That's ok. What was happening is we'd always get them in table order, so by randomizing how redis is first populated we don't start with the same list, which makes the first hour of any restart go slower because it runs out of keys (at least going from previous, non-randomized configuration to the changes I made).

ABrain7710 · 2023-08-29T19:51:31Z

augur/tasks/github/util/github_api_key_handler.py

        where = [WorkerOauth.access_token != self.config_key, WorkerOauth.platform == 'github']

-        return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).all()]
+        return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).order_by(func.random()).all()]


I don't think this is needed as this is just going to randomize the order of the keys that are cached in Redis the first time the GithubApiKeyHandler is created. So this will just make the keys in Redis stored in a different order

That's ok. What was happening is we'd always get them in table order, so by randomizing how redis is first populated we don't start with the same list, which makes the first hour of any restart go slower because it runs out of keys (at least going from previous, non-randomized configuration to the changes I made).

…ult branch

…able: defaultBranchRef

sgoggins · 2023-08-30T15:27:27Z

@IsaacMilarky
@ABrain7710

I was surprised by how much more effective introducing randomization in all three places than in any one of them extended or eliminated periods where the API Rate Limit was exceeded.

I tried each of them one at a time, and the triple shuffle has API key stops less than 1/2 the time as an single or combination shuffle.

NOTE: This PR now also includes a fix to the repo_info task. Previously we were pulling the main branch commits (actually, it was worse than that: it was master), so we were missing about 25% of commit counts, and getting counts off of sometimes "not the default branch". Now its accurate, and on the default branch commit count, so the alignment with our facade commit counting will be perfect. This makes it a lot easier to accurately verify commit collection.

sgoggins · 2023-08-30T15:32:14Z

I'm unsure there is an issue with Github API keys being assigned randomly. I say this because the current implementation is as simple as it gets. We store a list of keys and assign a random key to every request we make. So unless the random.choice function isn't working, I don't think this is the issue.

I noted this in my specific comments: but I went through several iterations trying to get all the data fully collected, and the three different shuffles occur at different points in execution and are causing key reuse at such a low rate that with a sufficient number of keys we hardly ever run out on a pretty large database.

Individually or in pairs, we reused the same key more often. So, if we think about how unpredictable job size will be, the fact that the triple approach is working with way less API key stalls, I think randomization is kind of an optimal way to do this; it worked really well on a large collection.

ABrain7710 · 2023-08-30T17:01:52Z

@sgoggins I made an additional observation on the key rotation issue in the discord here: https://discord.com/channels/839539671671504907/915622840798158859/1146192017420996738

Adding unique keys to enable concurrent refreshes to end blocking during refresh; Changed all but one to do a concurrent refresh. Still looking for what may be unique. Got rid of some Libyear related views that aren't quite what we want.

Signed-off-by: Sean P. Goggins <[email protected]>

…materialized view index Signed-off-by: Sean P. Goggins <[email protected]>

…y, since they can now be refreshed concurrently Signed-off-by: Sean P. Goggins <[email protected]>

… setting now that the refreshes are concurrent and will not lock the underlying tables Signed-off-by: Sean P. Goggins <[email protected]>

Signed-off-by: Sean P. Goggins <[email protected]>

…ripts Signed-off-by: Sean P. Goggins <[email protected]>

Signed-off-by: Sean P. Goggins <[email protected]>

Signed-off-by: sgoggins <[email protected]>

sgoggins added 24 commits August 27, 2023 16:03

stress test

b240008

test

58cecd4

tweaking

570c09a

keys not rotating fast enough

e6bf1f1

optimizing core

dc00428

keys freeze

487f7f9

test

0c040f7

try

30f777c

more conservative test.

6d4b07e

try this!

05cb234

a little more.

c53fa88

interesting

85711df

failed experiment

0a91df7

try this.

57bd526

perhaps

4347415

import

48ea444

lighten the logs

23da65b

spop

453bf8b

random shuffle

024a389

finding the error of my ways.

060d462

more checking

833221b

sample!

18d1c46

changed output to debug

437733f

Renormalized task type share

0651965

sgoggins requested review from ABrain7710 and IsaacMilarky August 28, 2023 05:28

sgoggins added bug Documents unexpected/wrong/buggy behavior bug-fix Fixes a bug labels Aug 28, 2023

sgoggins added 2 commits August 28, 2023 01:35

consistency in the maxes.

6a6214c

Another randomization to prevent keylock.

19949b8

Dialing it down.

537e469

ABrain7710 requested changes Aug 29, 2023

View reviewed changes

sgoggins added 2 commits August 30, 2023 09:08

Fixing logic in the repo_info commit_count query to refer to the defa…

c438717

…ult branch

fixed insert logic on repo_info for commit_count to refer to new vari…

3dd65ff

…able: defaultBranchRef

sgoggins requested review from ABrain7710 and IsaacMilarky August 30, 2023 15:18

sgoggins changed the base branch from main to dev August 30, 2023 15:28

sgoggins added 17 commits August 31, 2023 12:17

view fixing.

cc2d62e

materialized view maintenance.

9619b6e

Adding unique keys to enable concurrent refreshes to end blocking during refresh; Changed all but one to do a concurrent refresh. Still looking for what may be unique. Got rid of some Libyear related views that aren't quite what we want.

updated materialized view refresh

aa59483

Signed-off-by: Sean P. Goggins <[email protected]>

updated schema refresh to include the final materialized view

8364aee

Signed-off-by: Sean P. Goggins <[email protected]>

updating explorer_contributor_actions concurrently now that it has a …

2bf0cd9

…materialized view index Signed-off-by: Sean P. Goggins <[email protected]>

config now defaults the refresh for materialized views back to one da…

4e7475e

…y, since they can now be refreshed concurrently Signed-off-by: Sean P. Goggins <[email protected]>

included an update to the refresh_materialized_views_interval_in_days…

69e0b42

… setting now that the refreshes are concurrent and will not lock the underlying tables Signed-off-by: Sean P. Goggins <[email protected]>

syntax fix in sql

d7778ae

Signed-off-by: Sean P. Goggins <[email protected]>

updated methods names in alembic script

fc69726

Signed-off-by: Sean P. Goggins <[email protected]>

had an extra space in the indents in one of the changed db alembic sc…

f9ec47c

…ripts Signed-off-by: Sean P. Goggins <[email protected]>

fix'

087ff33

Signed-off-by: Sean P. Goggins <[email protected]>

fix'

fd2ef2f

Signed-off-by: Sean P. Goggins <[email protected]>

fix'

dcc875e

Signed-off-by: Sean P. Goggins <[email protected]>

fix'

bb56462

Signed-off-by: Sean P. Goggins <[email protected]>

checking

e200b28

retest

24a4863

Signed-off-by: sgoggins <[email protected]>

reducing randomization and logging after tests

98ba73e

Signed-off-by: sgoggins <[email protected]>

ABrain7710 approved these changes Sep 1, 2023

View reviewed changes

sgoggins merged commit 51bc92f into dev Sep 1, 2023
1 check passed

sgoggins deleted the dev-stress-test branch September 27, 2023 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated key rotation to be more random #2504

Updated key rotation to be more random #2504

sgoggins commented Aug 28, 2023 •

edited

Loading

ABrain7710 left a comment

ABrain7710 Aug 29, 2023

sgoggins Aug 30, 2023

ABrain7710 Aug 29, 2023

sgoggins Aug 30, 2023

ABrain7710 Aug 29, 2023

sgoggins Aug 30, 2023

ABrain7710 Aug 29, 2023

sgoggins Aug 30, 2023

sgoggins commented Aug 30, 2023

sgoggins commented Aug 30, 2023 •

edited

Loading

ABrain7710 commented Aug 30, 2023

Updated key rotation to be more random #2504

Updated key rotation to be more random #2504

Conversation

sgoggins commented Aug 28, 2023 • edited Loading

ABrain7710 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgoggins commented Aug 30, 2023

sgoggins commented Aug 30, 2023 • edited Loading

ABrain7710 commented Aug 30, 2023

sgoggins commented Aug 28, 2023 •

edited

Loading

sgoggins commented Aug 30, 2023 •

edited

Loading