Skip to content

Apify RQ client list_head method returns truncated unique keys and URLs for long URLs #630

@vdusek

Description

@vdusek

Description

When calling ApifyRequestQueueSingleClient._list_head(), the response contains truncated values of unique_key and url for long URLs. Longer than 128 chars, see https://github.com/apify/apify-core/blob/develop/src/api/src/lib/request_queues/request_queue.ts#L53. This causes issues, because we compute the request_id locally from this unique_key, which means we end up generating a different/invalid ID when the key is truncated. It is happening here: https://github.com/apify/apify-sdk-python/blob/v3.0.1/src/apify/storage_clients/_apify/_request_queue_single_client.py#L223:L263.

Reproduction

import asyncio

from apify import Actor, Request

URL = 'https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?path=/eosm-public-offer&officeLabels=%7B%7D&page=1&pageSize=100000&sortColumn=zdatzvsm&sortOrder=-1'


async def main() -> None:
    async with Actor:
        request = Request.from_url(
            URL,
            use_extended_unique_key=True,
            always_enqueue=True,
        )
        print('request:', request)

        rq = await Actor.open_request_queue(force_cloud=True)

        processed_request = await rq.add_request(request)
        print('processed_request:', processed_request)

        request_obtained = await rq.fetch_next_request()
        print('request_obtained:', request_obtained)


if __name__ == '__main__':
    asyncio.run(main())

Execution and logs

$ uv run python debug_unique_key.py 
[apify] INFO  Initializing Actor ({"apify_sdk_version": "3.0.2", "apify_client_version": "2.1.0", "crawlee_version": "1.0.2", "python_version": "3.13.0", "os": "linux"})
request: unique_key='GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/eosm-public-offer&sortcolumn=zdatzvsm&sortorder=-1_rUB4KC98Eq8hmmS2w' url='https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?path=/eosm-public-offer&officeLabels=%7B%7D&page=1&pageSize=100000&sortColumn=zdatzvsm&sortOrder=-1' method='GET' headers=HttpHeaders(root={}) payload=None user_data=UserData(crawlee_data=None, label=None) retry_count=0 no_retry=False loaded_url=None handled_at=None
processed_request: id='LGpjlKaziZMS9qP' unique_key='GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/eosm-public-offer&sortcolumn=zdatzvsm&sortorder=-1_rUB4KC98Eq8hmmS2w' was_already_present=False was_already_handled=False
[apify.storage_clients._apify._request_queue_single_client] WARN  Could not fetch request data for unique_key=`GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/e [truncated]` (id=`DQqPIA6KnkfOwVt`)
[apify] INFO  Exiting Actor ({"exit_code": 0})

Root cause

  • The Apify Platform truncates unique_key and url fields in the list_head API response for values longer than 128 characters.
  • Since the SDK uses unique_key to recompute the request ID locally, truncation leads to mismatched or invalid IDs, causing failed lookups.

Potential solution

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions