|
next_since = since |
|
params = {'orderBy': 'uploadTime', 'length': page_size, 'since': since} |
|
r = requests.get(source_images_endpoint, headers=source_api_auth_headers, params=params) |
|
if r.status_code != 200: |
|
raise Exception('Failed to fetch images: ' + str(r.json())) |
|
|
|
images = r.json()["data"] |
|
todo = r.json()['total'] |
|
|
|
# Check that we are still making forward progress |
|
if todo == last_todo: |
|
raise Exception('Pagination is stuck in a block of items with the same uploadTime. Increasing page_size may help') |
|
last_todo = todo |
|
|
|
# Foreach image on page |
|
for image in images: |
|
found[image["data"]["id"]] = 1 |
|
process_image(image) |
|
next_since = image["data"]["uploadTime"] |
|
todo -= 1 |
|
|
|
# The since filter is non inclusive. If there are more items with the exact uploadTime as the last item on a given we could miss them |
|
# Roll back the since parameter by 1 milli to correct for this at the risk of entering an infinite loop and some duplicate work |
|
# The Grid does not seem to support a subordering by id which would help to break these ties. |
|
corrected = parse(next_since) - datetime.timedelta(milliseconds=1) |
|
since = corrected.strftime( "%Y-%m-%dT%H:%M:%S.%f%z") |
|
print(str(todo) + " remaining") |
I think using since for this script could prove problematic since there could be natural bursts of uploadTimes being identical and there's scope for duplicates in the response on the boundary. I think this is what offset query param is for! I think if you did orderBy=-uploadTime (note the negation, i.e. oldest to newest) you could safely and predictably use offset for pagination, meaning you wouldn't need those checks on progress nor single millisecond subtraction bit.
grid-tools/migrate.py
Lines 32 to 58 in ea88fc2
I think using
sincefor this script could prove problematic since there could be natural bursts of uploadTimes being identical and there's scope for duplicates in the response on the boundary. I think this is whatoffsetquery param is for! I think if you didorderBy=-uploadTime(note the negation, i.e. oldest to newest) you could safely and predictably useoffsetfor pagination, meaning you wouldn't need those checks on progress nor single millisecond subtraction bit.