Skip to content

Feat/workers#1

Merged
gabrielgz0 merged 4 commits into
mainfrom
feat/workers
May 13, 2026
Merged

Feat/workers#1
gabrielgz0 merged 4 commits into
mainfrom
feat/workers

Conversation

@gabrielgz0
Copy link
Copy Markdown
Owner

This pull request introduces a new, configurable prefetch and concurrency mechanism for paginated resource iteration, enabling significant performance improvements when retrieving large datasets. The API is extended to support a prefetch parameter, allowing users to select between sequential, simple prefetch, or multi-worker concurrent fetching. Documentation and docstrings are updated accordingly, and resource classes are streamlined for clarity.

Prefetch and Concurrency for Pagination

  • Added a prefetch parameter to all list_all* methods in AtasResource, ContratosResource, and ContratacoesResource, allowing users to control the level of concurrency: sequential (0), simple prefetch (1, default), or N workers (N≥2). [1] [2] [3] [4] [5] [6] [7]
  • Implemented three pagination strategies in BaseResource._list_all: sequential, prefetch (background fetch of next page), and concurrent workers (multiple pages fetched in parallel with ordered delivery).
  • Updated the README with detailed explanations and usage examples for the new prefetch and concurrency options, including diagrams and recommendations. [1] [2]

Documentation and Code Cleanup

  • Simplified and clarified resource class docstrings, removing redundant endpoint lists and harmonizing argument descriptions for consistency. [1] [2] [3] [4] [5] [6] [7]
  • Improved docstrings for public methods to reflect the new prefetch parameter and its behavior. [1] [2] [3] [4] [5] [6]

Internal Refactoring

  • Added an internal _STOP sentinel for worker coordination and refactored the code to use asyncio more robustly for background and concurrent fetching. [1] [2]

These changes enable much faster data collection and scraping scenarios, while maintaining a simple, backward-compatible API for end users.

BaseResource._list_all aceita prefetch=N (padrao 1).
Cada nova pagina ja dispara o download da proxima em
background via asyncio.ensure_future — a latencia da API
fica sobreposta ao processamento do consumidor.

- _list_all com prefetch em BaseResource
- Parametro exposto em todos os list_all* (contratos, contratacoes, atas)
- README atualizado na secao Paginacao com diagrama e exemplo de prefetch=0
O bug: remaining era decrementado no finally do worker ANTES
de _STOP ser posto na fila. O consumidor via remaining=0 e
saia do while, ignorando itens ainda na fila.

Troca remaining por contagem de _STOP recebidos: cada worker
poe _STOP ao finalizar, e o consumidor conta ate num_workers.

prefetch agora:
- 0: sequencial
- 1: preload simples (1 pagina de antecipacao)
- >=2: N workers concorrentes com stride e buffer ordenado
Adiciona test_list_all_with_workers (prefetch=2) e
test_list_all_sequential (prefetch=0) para cobrir as
3 estrategias do _list_all router.

Cobertura de base.py: 46% → 90%
Cobertura total: 90.49%
README agora explica as 3 estrategias:
- prefetch=0: sequencial
- prefetch=1: preload background (padrao)
- prefetch=N: N workers concorrentes com stride e diagrama

Docstrings dos resources atualizadas para:
Nivel de concorrencia: 0=seq, 1=prefetch, N=N workers
@gabrielgz0 gabrielgz0 merged commit e5d706d into main May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant