Skip to content

probably best to use offset rather than since for paging in migration #1

@twrichards

Description

@twrichards

grid-tools/migrate.py

Lines 32 to 58 in ea88fc2

next_since = since
params = {'orderBy': 'uploadTime', 'length': page_size, 'since': since}
r = requests.get(source_images_endpoint, headers=source_api_auth_headers, params=params)
if r.status_code != 200:
raise Exception('Failed to fetch images: ' + str(r.json()))
images = r.json()["data"]
todo = r.json()['total']
# Check that we are still making forward progress
if todo == last_todo:
raise Exception('Pagination is stuck in a block of items with the same uploadTime. Increasing page_size may help')
last_todo = todo
# Foreach image on page
for image in images:
found[image["data"]["id"]] = 1
process_image(image)
next_since = image["data"]["uploadTime"]
todo -= 1
# The since filter is non inclusive. If there are more items with the exact uploadTime as the last item on a given we could miss them
# Roll back the since parameter by 1 milli to correct for this at the risk of entering an infinite loop and some duplicate work
# The Grid does not seem to support a subordering by id which would help to break these ties.
corrected = parse(next_since) - datetime.timedelta(milliseconds=1)
since = corrected.strftime( "%Y-%m-%dT%H:%M:%S.%f%z")
print(str(todo) + " remaining")

I think using since for this script could prove problematic since there could be natural bursts of uploadTimes being identical and there's scope for duplicates in the response on the boundary. I think this is what offset query param is for! I think if you did orderBy=-uploadTime (note the negation, i.e. oldest to newest) you could safely and predictably use offset for pagination, meaning you wouldn't need those checks on progress nor single millisecond subtraction bit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions