Skip to content

Feat/pass countinference to serverless getweights #1373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bigbitbus
Copy link
Contributor

Description

Pass through the countinference and service_secret parameters to the roboflow API backend.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Testing in staging (ongoing)

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:

@bigbitbus bigbitbus force-pushed the feat/pass-countinference-to-serverless-getweights branch from 51dcd37 to 9ab0396 Compare June 18, 2025 19:52
codeflash-ai bot added a commit that referenced this pull request Jun 24, 2025
…(`feat/pass-countinference-to-serverless-getweights`)

Here's an optimized rewrite of your program, addressing profiling hot spots and general efficiency improvements.

**Optimization Summary:**

1. **Avoid Redundant Method Calls:** 
   - Minimize repeated lookups and calculations.
   - Cache computations/results when possible within function scope.
2. **Lazy Imports:** 
   - Move GC and optional torch imports where needed (they are only used upon eviction).
3. **Deque Optimizations:** 
   - In `WithFixedSizeCache.add_model`, avoid repeated `self._key_queue.remove(queue_id)` by checking position or maintaining a set for fast checks (no need, since only called if known present, and block is rare). Still, code can be reduced for clarity.
4. **Reduce logging** in the hot add logic (unless DEBUG mode; logging is a major time sink during profiling).
5. **Batch Removals:** 
   - Accumulate models to remove and do a single `gc.collect()` call after, instead of per-iteration. 
6. **Data structure** choices are left unchanged (deque is still best for explicit ordering here).
7. **General Logic**: Use local variables for lookups on attributes used multiple times (minor, but helps).

---




**Key Runtime Optimizations:**
- Only call `gc.collect()` after all removals in a batch, not after every single model eviction.
- Reduced logging in hot code paths (this was responsible for noticeable time in profiling).
- Use local variables when repeatedly accessing class attributes.
- Use direct inlining for `_resolve_queue_id` for this use case.
- Defensive handling if queue/model state falls out of sync—never throws unnecessarily.

**Performance Note:**
If you profile again after these changes, most of the time will now be in actual model loading and removal. That is, this code will not be a noticeable bottleneck anymore in the workflow. If LRU cache size is much larger, consider further data structure optimizations such as a dict for constant-time eviction and presence checking, but for N ~ 8 this is not needed.
Copy link
Contributor

codeflash-ai bot commented Jun 24, 2025

⚡️ Codeflash found optimizations for this PR

📄 50% (0.50x) speedup for WithFixedSizeCache.add_model in inference/core/managers/decorators/fixed_size_cache.py

⏱️ Runtime : 1.08 seconds 722 milliseconds (best of 12 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch feat/pass-countinference-to-serverless-getweights).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants