Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated key rotation to be more random #2504

Merged
merged 48 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b240008
stress test
sgoggins Aug 27, 2023
58cecd4
test
sgoggins Aug 27, 2023
570c09a
tweaking
sgoggins Aug 27, 2023
e6bf1f1
keys not rotating fast enough
sgoggins Aug 27, 2023
dc00428
optimizing core
sgoggins Aug 28, 2023
487f7f9
keys freeze
sgoggins Aug 28, 2023
0c040f7
test
sgoggins Aug 28, 2023
30f777c
try
sgoggins Aug 28, 2023
6d4b07e
more conservative test.
sgoggins Aug 28, 2023
05cb234
try this!
sgoggins Aug 28, 2023
c53fa88
a little more.
sgoggins Aug 28, 2023
85711df
interesting
sgoggins Aug 28, 2023
0a91df7
failed experiment
sgoggins Aug 28, 2023
57bd526
try this.
sgoggins Aug 28, 2023
4347415
perhaps
sgoggins Aug 28, 2023
48ea444
import
sgoggins Aug 28, 2023
23da65b
lighten the logs
sgoggins Aug 28, 2023
453bf8b
spop
sgoggins Aug 28, 2023
024a389
random shuffle
sgoggins Aug 28, 2023
060d462
finding the error of my ways.
sgoggins Aug 28, 2023
833221b
more checking
sgoggins Aug 28, 2023
18d1c46
sample!
sgoggins Aug 28, 2023
437733f
changed output to debug
sgoggins Aug 28, 2023
0651965
Renormalized task type share
sgoggins Aug 28, 2023
6a6214c
consistency in the maxes.
sgoggins Aug 28, 2023
19949b8
Another randomization to prevent keylock.
sgoggins Aug 28, 2023
3388c76
Merge pull request #2506 from chaoss/dev
sgoggins Aug 28, 2023
be0dd4f
Dial it back on mortal hardware.
sgoggins Aug 28, 2023
537e469
Dialing it down.
sgoggins Aug 28, 2023
c438717
Fixing logic in the repo_info commit_count query to refer to the defa…
sgoggins Aug 30, 2023
3dd65ff
fixed insert logic on repo_info for commit_count to refer to new vari…
sgoggins Aug 30, 2023
cc2d62e
view fixing.
sgoggins Aug 31, 2023
9619b6e
materialized view maintenance.
sgoggins Aug 31, 2023
aa59483
updated materialized view refresh
sgoggins Sep 1, 2023
8364aee
updated schema refresh to include the final materialized view
sgoggins Sep 1, 2023
2bf0cd9
updating explorer_contributor_actions concurrently now that it has a …
sgoggins Sep 1, 2023
4e7475e
config now defaults the refresh for materialized views back to one da…
sgoggins Sep 1, 2023
69e0b42
included an update to the refresh_materialized_views_interval_in_days…
sgoggins Sep 1, 2023
d7778ae
syntax fix in sql
sgoggins Sep 1, 2023
fc69726
updated methods names in alembic script
sgoggins Sep 1, 2023
f9ec47c
had an extra space in the indents in one of the changed db alembic sc…
sgoggins Sep 1, 2023
087ff33
fix'
sgoggins Sep 1, 2023
fd2ef2f
fix'
sgoggins Sep 1, 2023
dcc875e
fix'
sgoggins Sep 1, 2023
bb56462
fix'
sgoggins Sep 1, 2023
e200b28
checking
sgoggins Sep 1, 2023
24a4863
retest
sgoggins Sep 1, 2023
98ba73e
reducing randomization and logging after tests
sgoggins Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions augur/application/cli/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,21 +170,21 @@ def determine_worker_processes(ratio,maximum):
sleep_time += 6

#60% of estimate, Maximum value of 45
core_num_processes = determine_worker_processes(.6, 80)
core_num_processes = determine_worker_processes(.6, 45)
logger.info(f"Starting core worker processes with concurrency={core_num_processes}")
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={core_num_processes} -n core:{uuid.uuid4().hex}@%h"
process_list.append(subprocess.Popen(core_worker.split(" ")))
sleep_time += 6

#20% of estimate, Maximum value of 25
secondary_num_processes = determine_worker_processes(.2, 26)
secondary_num_processes = determine_worker_processes(.2, 25)
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}")
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
process_list.append(subprocess.Popen(secondary_worker.split(" ")))
sleep_time += 6

#15% of estimate, Maximum value of 20
facade_num_processes = determine_worker_processes(.2, 40)
facade_num_processes = determine_worker_processes(.2, 20)
logger.info(f"Starting facade worker processes with concurrency={facade_num_processes}")
facade_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={facade_num_processes} -n facade:{uuid.uuid4().hex}@%h -Q facade"

Expand Down
16 changes: 15 additions & 1 deletion augur/tasks/github/issues/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from sqlalchemy.exc import IntegrityError

from augur.tasks.github.util.github_api_key_handler import GithubApiKeyHandler

from augur.tasks.init.celery_app import celery_app as celery
from augur.tasks.init.celery_app import AugurCoreRepoCollectionTask
Expand All @@ -29,16 +30,29 @@ def collect_issues(repo_git : str) -> int:

augur_db = manifest.augur_db

logger.info(f'this is the manifest.key_auth value: {str(manifest.key_auth)}')

try:

query = augur_db.session.query(Repo).filter(Repo.repo_git == repo_git)
repo_obj = execute_session_query(query, 'one')
repo_id = repo_obj.repo_id

#try this
# the_key = manifest.key_auth
# try:
# randomon = GithubApiKeyHandler(augur_db.session)
# the_key = randomon.get_random_key()
# logger.info(f'The Random Key {the_key}')
# except Exception as e:
# logger.info(f'error: {e}')
# the_key = manifest.key_auth
# pass

owner, repo = get_owner_repo(repo_git)

issue_data = retrieve_all_issue_data(repo_git, logger, manifest.key_auth)

#issue_data = retrieve_all_issue_data(repo_git, logger, the_key)

if issue_data:
total_issues = len(issue_data)
Expand Down
19 changes: 17 additions & 2 deletions augur/tasks/github/util/github_api_key_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from augur.tasks.util.redis_list import RedisList
from augur.application.db.session import DatabaseSession
from augur.application.config import AugurConfig
from sqlalchemy import func


class NoValidKeysError(Exception):
Expand Down Expand Up @@ -39,7 +40,7 @@ def __init__(self, session: DatabaseSession):

self.keys = self.get_api_keys()

# self.logger.debug(f"Retrieved {len(self.keys)} github api keys for use")
self.logger.info(f"Retrieved {len(self.keys)} github api keys for use")

def get_random_key(self):
"""Retrieves a random key from the list of keys
Expand Down Expand Up @@ -71,9 +72,11 @@ def get_api_keys_from_database(self) -> List[str]:
from augur.application.db.models import WorkerOauth

select = WorkerOauth.access_token
# randomizing the order at db time
#select.order_by(func.random())
where = [WorkerOauth.access_token != self.config_key, WorkerOauth.platform == 'github']

return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).all()]
return [key_tuple[0] for key_tuple in self.session.query(select).filter(*where).order_by(func.random()).all()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed as this is just going to randomize the order of the keys that are cached in Redis the first time the GithubApiKeyHandler is created. So this will just make the keys in Redis stored in a different order

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok. What was happening is we'd always get them in table order, so by randomizing how redis is first populated we don't start with the same list, which makes the first hour of any restart go slower because it runs out of keys (at least going from previous, non-randomized configuration to the changes I made).



def get_api_keys(self) -> List[str]:
Expand Down Expand Up @@ -130,6 +133,18 @@ def get_api_keys(self) -> List[str]:
if not valid_keys:
raise NoValidKeysError("No valid github api keys found in the config or worker oauth table")


# shuffling the keys so not all processes get the same keys in the same order
valid_now = valid_keys
try:
self.logger.debug(f'valid keys before shuffle: {valid_keys}')
valid_keys = random.sample(valid_keys, len(valid_keys))
self.logger.debug(f'valid keys AFTER shuffle: {valid_keys}')
except Exception as e:
self.logger.debug(f'{e}')
valid_keys = valid_now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only gets executed the first time the GithubApiKeyHandler is created after that the keys will be stored in Redis so this function will return in the if redis_keys block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok. What was happening is we'd always get them in table order, so by randomizing how redis is first populated we don't start with the same list, which makes the first hour of any restart go slower because it runs out of keys (at least going from previous, non-randomized configuration to the changes I made).

pass

return valid_keys

def is_bad_api_key(self, client: httpx.Client, oauth_key: str) -> bool:
Expand Down
2 changes: 2 additions & 0 deletions augur/tasks/github/util/github_random_key_auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from augur.tasks.util.random_key_auth import RandomKeyAuth
from augur.tasks.github.util.github_api_key_handler import GithubApiKeyHandler
from augur.application.db.session import DatabaseSession
import random


class GithubRandomKeyAuth(RandomKeyAuth):
Expand All @@ -16,6 +17,7 @@ def __init__(self, session: DatabaseSession, logger):

# gets the github api keys from the database via the GithubApiKeyHandler
github_api_keys = GithubApiKeyHandler(session).keys
github_api_keys = random.sample(github_api_keys, len(github_api_keys))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the goal here is to randomize the order. If so random.shuffle should be used since Random.sample is used to get a random subset of a list. This will randomly order the list but it is less clear what the goal is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, we get more durable randomness from random.sample than random.shuffle ... random.shuffle reorders the original list, but it does this in memory, it doesn't reorder what is stored in redis. random.sample does a "better" job of getting us novel keys from a list every time. I tested both, and was reusing the same key about 3x as often with shuffle compared to random.

From Here: https://blog.enterprisedna.co/python-shuffle-list/

It’s essential to note that the shuffle() function returns None and modifies the original list or array. Therefore, it’s unsuitable for cases where you must maintain the original list order.

To return a new list containing elements from the original list without modifying it, you can use the sample() function from the random library:

if not github_api_keys:
print("Failed to find github api keys. This is usually because your key has expired")
Expand Down
6 changes: 3 additions & 3 deletions augur/tasks/start_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,16 +261,16 @@ def augur_collection_monitor():
enabled_phase_names = get_enabled_phase_names_from_config(session.logger, session)

if primary_repo_collect_phase.__name__ in enabled_phase_names:
start_primary_collection(session, max_repo=40)
start_primary_collection(session, max_repo=30)

if secondary_repo_collect_phase.__name__ in enabled_phase_names:
start_secondary_collection(session, max_repo=10)

if facade_phase.__name__ in enabled_phase_names:
start_facade_collection(session, max_repo=30)
start_facade_collection(session, max_repo=20)

if machine_learning_phase.__name__ in enabled_phase_names:
start_ml_collection(session,max_repo=5)
start_ml_collection(session,max_repo=1)

# have a pipe of 180

Expand Down
1 change: 1 addition & 0 deletions augur/tasks/util/random_key_auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ def auth_flow(self, request: Request) -> Generator[Request, Response, None]:

# set the headers of the request with the new key
request.headers[self.header_name] = key_string
#self.logger.info(f"List of Keys: {self.list_of_keys}")

else:
self.logger.error(f"There are no valid keys to make a request with: {self.list_of_keys}")
Expand Down
6 changes: 4 additions & 2 deletions augur/tasks/util/redis_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,10 @@ def pop(self, index: int = None):
"""

if index is None:

redis.rpop(self.redis_list_key)
# This will get a random index from the list and remove it,
# decreasing the likelihood of everyone using the same key all the time
#redis.rpop(self.redis_list_key)
redis.spop(self.redis_list_key)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't do anything because this isn't how we are getting the api keys from redis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spop does a randomized retrieval, while rpop does it in the order. The issue with using the order is that we will never know and don't want to try to know how big each job is before we do it. This balances out our key counts. I proved this by changing it and running it and seeing a HUGE gain in collection rate.

else:
# calls __delitem__
Expand Down
Loading