-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Description
When calling ApifyRequestQueueSingleClient._list_head()
, the response contains truncated values of unique_key
and url
for long URLs. Longer than 128 chars, see https://github.com/apify/apify-core/blob/develop/src/api/src/lib/request_queues/request_queue.ts#L53. This causes issues, because we compute the request_id locally from this unique_key, which means we end up generating a different/invalid ID when the key is truncated. It is happening here: https://github.com/apify/apify-sdk-python/blob/v3.0.1/src/apify/storage_clients/_apify/_request_queue_single_client.py#L223:L263.
Reproduction
import asyncio
from apify import Actor, Request
URL = 'https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?path=/eosm-public-offer&officeLabels=%7B%7D&page=1&pageSize=100000&sortColumn=zdatzvsm&sortOrder=-1'
async def main() -> None:
async with Actor:
request = Request.from_url(
URL,
use_extended_unique_key=True,
always_enqueue=True,
)
print('request:', request)
rq = await Actor.open_request_queue(force_cloud=True)
processed_request = await rq.add_request(request)
print('processed_request:', processed_request)
request_obtained = await rq.fetch_next_request()
print('request_obtained:', request_obtained)
if __name__ == '__main__':
asyncio.run(main())
Execution and logs
$ uv run python debug_unique_key.py
[apify] INFO Initializing Actor ({"apify_sdk_version": "3.0.2", "apify_client_version": "2.1.0", "crawlee_version": "1.0.2", "python_version": "3.13.0", "os": "linux"})
request: unique_key='GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/eosm-public-offer&sortcolumn=zdatzvsm&sortorder=-1_rUB4KC98Eq8hmmS2w' url='https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?path=/eosm-public-offer&officeLabels=%7B%7D&page=1&pageSize=100000&sortColumn=zdatzvsm&sortOrder=-1' method='GET' headers=HttpHeaders(root={}) payload=None user_data=UserData(crawlee_data=None, label=None) retry_count=0 no_retry=False loaded_url=None handled_at=None
processed_request: id='LGpjlKaziZMS9qP' unique_key='GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/eosm-public-offer&sortcolumn=zdatzvsm&sortorder=-1_rUB4KC98Eq8hmmS2w' was_already_present=False was_already_handled=False
[apify.storage_clients._apify._request_queue_single_client] WARN Could not fetch request data for unique_key=`GET|e3b0c442|e3b0c442|https://portal.isoss.gov.cz/irj/portal/anonymous/mvrest?officelabels=%7b%7d&page=1&pagesize=100000&path=/e [truncated]` (id=`DQqPIA6KnkfOwVt`)
[apify] INFO Exiting Actor ({"exit_code": 0})
Root cause
- The Apify Platform truncates
unique_key
andurl
fields in thelist_head
API response for values longer than 128 characters. - Since the SDK uses
unique_key
to recompute the request ID locally, truncation leads to mismatched or invalid IDs, causing failed lookups.
Potential solution
- Hotfix - Detect
[truncated]
suffix inunique_key
, and if found, make an additionalget_request(id)
call to fetch the full record. As part of SDK + Scrapy integration:RequestQueue.fetch_next_request
crashes with PydanticValidationError
#627. - Proper fix - Refactor SDK caching logic to use
request_id
instead ofunique_key
, eliminating the dependency on possibly truncated values.
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.