Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

Make fetching and filtering more performant #3

Open
danisyellis opened this issue Jun 11, 2019 · 4 comments
Open

Make fetching and filtering more performant #3

danisyellis opened this issue Jun 11, 2019 · 4 comments
Labels
wontfix This will not be worked on

Comments

@danisyellis
Copy link
Collaborator

danisyellis commented Jun 11, 2019

(NOTE: I made this ticket a while ago, but performance/ amount of time Starfish takes to run has never been a problem for us at Indeed. I think a company would have to be checking an extremely large number of employees for this to matter. So I'm going to put this in the backlog. If someone wants to work on it, great, but I don't think it's a particularly useful change at the moment.)

Currently We:

  1. ping the github API for a person's events
  2. Look at the first page and keep only the events that are event types we care about
  3. Do this for every page of event history that Github has (they hold up to 300 events at a time, 10 per page)
  4. Now, we look through that array of events to see if one is in the correct time period, and stop looking when we find one.

However, there's no reason to look through all 30 pages of a person's events if an event on the first page meets both criteria

So, refactor the code to check for

  1. event type
  2. if it happened in the time range
    BEFORE fetching the next page. If those are both true, log the contributor's alternate id and move on to the next person.
@danisyellis danisyellis self-assigned this Oct 28, 2019
@jeffwilcox
Copy link

Also consider storing the e-tag returned for a person, so you can not retrieve results if there's been no change.

See also - conditional requests: https://developer.github.com/v3/#conditional-requests

@danisyellis
Copy link
Collaborator Author

That's a good idea Jeff. I think I'll make a new issue for that to make it more visible.
Do you have a recommendation between using e-tags or the Last-Modified header?

I also saw in one of your other comments that you're using octokit, and I'm curious what advantage using octokit gives you over plain api calls?

@jeffwilcox
Copy link

We chose the e-tag and never looked back, so I don't have any useful guidance!

I want to say that Octokit has given us better long-term support, but we've still had to deal with a number of breaking changes over the years. So I'm going to put it in the "trying to help our GitHub friends build and bake this great library" as my reason.

There are also some plug-ins already created on retry policies, abuse limiters, etc., that may save time for some.

@danisyellis
Copy link
Collaborator Author

Thanks!

@danisyellis danisyellis removed their assignment Jan 8, 2021
@danisyellis danisyellis added good first issue Good for newcomers up-for-grabs This issue is ready to be worked on, and unassigned labels Jan 8, 2021
@danisyellis danisyellis removed good first issue Good for newcomers up-for-grabs This issue is ready to be worked on, and unassigned labels Jan 22, 2021
@danisyellis danisyellis added the wontfix This will not be worked on label Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants