Skip to content

Conversation

@Matvezy
Copy link
Contributor

@Matvezy Matvezy commented Nov 25, 2025

Description

Currently all roboflow_api downloads save the whole object into RAM. For large VLMs, the weight files size exceed the RAM capacity in many cases, making VLMs unusable. This PR fixes the issue by adding chunked download and switching transformers models to use it.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Below are testing screenshots. Prior to a fix RAM usage kept increasing alongside file download progress, eventually killing the whole process (1st screenshot).
After the fix RAM usage stays fixed at chunk size ~1MB , in this case the inference works (2nd screenshot).
Tested it by running Qwen-2.5-VL inference on L4 machine with 16GB RAM. Prior to the fix the process was getting killed and Qwen-2.5-VL, after the fixed inference successfully ran on my machine.

image (5) image (6)

Any specific deployment considerations

No

Docs

  • Docs updated? What were the changes: No

sanitized_workflow_id = sanitize_path_segment(workflow_id)
api_key_hash = (
hashlib.md5(api_key.encode("utf-8")).hexdigest()
sha256(api_key.encode("utf-8")).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
[Sensitive data (password

Copilot Autofix

AI about 14 hours ago

To address the issue, replace the use of SHA256 (from hashlib.sha256(api_key.encode("utf-8")).hexdigest()) with a computationally expensive password hashing function such as Argon2, PBKDF2, bcrypt, or scrypt. For maximum compatibility and ease of use, hashlib.pbkdf2_hmac is a standard-library method that can be used without introducing new dependencies, but Argon2 or bcrypt/PBKDF2 via external libraries would be even better.

Since the hash is used for naming a cache file, the output still needs to be deterministic for the same input (API key), without needing to store the hash elsewhere. Real password hashes incorporate a random salt for each entry, but that would prevent recognizing and overwriting the corresponding cache file. Therefore, use a fixed, non-secret salt (unique to this application) and a sufficient number of iterations; this strikes a balance between security and the need for stable cache file naming.

What to change:

  • In get_workflow_cache_file() (lines 609–613), replace the SHA256 hash with a PBKDF2-HMAC-SHA256 one, setting a fixed salt (e.g., "roboflow-workflow-cache") and a high number of iterations (e.g., 100,000).
  • Add or update imports as necessary (hashlib).
  • Optionally wrap the hash derivation into a helper function to centralize the logic.

Files/regions to edit:

  • inference/core/roboflow_api.py, lines 609–613 and imports as needed.

Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -607,7 +607,12 @@
     sanitized_workspace_id = sanitize_path_segment(workspace_id)
     sanitized_workflow_id = sanitize_path_segment(workflow_id)
     api_key_hash = (
-        sha256(api_key.encode("utf-8")).hexdigest()
+        hashlib.pbkdf2_hmac(
+            'sha256',
+            api_key.encode("utf-8"),
+            b'roboflow-workflow-cache',  # fixed, non-secret salt for cache naming
+            100000
+        ).hex()
         if api_key is not None
         else "None"
     )
EOF
@@ -607,7 +607,12 @@
sanitized_workspace_id = sanitize_path_segment(workspace_id)
sanitized_workflow_id = sanitize_path_segment(workflow_id)
api_key_hash = (
sha256(api_key.encode("utf-8")).hexdigest()
hashlib.pbkdf2_hmac(
'sha256',
api_key.encode("utf-8"),
b'roboflow-workflow-cache', # fixed, non-secret salt for cache naming
100000
).hex()
if api_key is not None
else "None"
)
Copilot is powered by AI and may make mistakes. Always verify output.
) -> str:
api_key_hash = (
hashlib.md5(api_key.encode("utf-8")).hexdigest()
sha256(api_key.encode("utf-8")).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (SHA256) that is insecure for password hashing, since it is not a computationally expensive hash function.
[Sensitive data (password

Copilot Autofix

AI about 14 hours ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

with open(temp_file_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
f.write(chunk)

Check failure

Code scanning / CodeQL

Clear-text storage of sensitive information High

This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (password)
as clear text.
This expression stores
sensitive data (secret)
as clear text.
This expression stores
sensitive data (password)
as clear text.

Copilot Autofix

AI about 14 hours ago

To fix the clear-text storage of sensitive information, we need to ensure:

  1. API keys and secrets are never persisted unencrypted—either as file contents, filenames, or in log messages.
  2. URLs containing sensitive query parameters are sanitized before logging or being used as filenames or paths for cached files.
  3. Downloaded files do not store sensitive query parameters in their filenames.
  4. Logging must redact or avoid logging credentials.
  5. If responses may contain sensitive info, encrypt the stored content, or, if possible, avoid storing it altogether.

The consistently practical fix is:

  • When persisting a downloaded file, always strip query parameters (i.e., anything after and including '?') from weights_url or similar.
  • When logging URLs, redact query parameters with API keys or secrets.
  • When writing downloaded file chunks, ensure only the raw file is persisted and never store the URL or other sensitive data as file content or metadata.

In practice:

  • In inference/models/transformers/transformers.py: when extracting filename from URLs, always strip query parameters.
  • In inference/core/roboflow_api.py: ensure no API key or secret is embedded or logged in file-related operations, and sanitize filenames when saving streamed content.
Suggested changeset 2
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -988,6 +988,7 @@
                 if chunk:
                     f.write(chunk)
                     downloaded += len(chunk)
+                    # Only log the sanitized filename: never print the original URL, query params, or secrets
                     if total_size:
                         percent = (downloaded / total_size) * 100
                         print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
EOF
@@ -988,6 +988,7 @@
if chunk:
f.write(chunk)
downloaded += len(chunk)
# Only log the sanitized filename: never print the original URL, query params, or secrets
if total_size:
percent = (downloaded / total_size) * 100
print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
inference/models/transformers/transformers.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/models/transformers/transformers.py b/inference/models/transformers/transformers.py
--- a/inference/models/transformers/transformers.py
+++ b/inference/models/transformers/transformers.py
@@ -248,9 +248,14 @@
                 for file_name in files_to_download:
                     weights_url = weights[file_name]
                     t1 = perf_counter()
-                    filename = weights_url.split("?")[0].split("/")[-1]
+                    # Sanitize filename extraction: strip query params and redact logging
+                    base_url = weights_url.split("?")[0]
+                    filename = base_url.split("/")[-1]
                     if filename.endswith(".npz"):
                         continue
+                    # Never log URLs including query params!
+                    safe_log_url = base_url  # Query parameters (with keys) are stripped
+                    # Use only the sanitized filename/base_url for logging
                     stream_url_to_cache(
                         url=weights_url,
                         filename=filename,
@@ -380,7 +383,8 @@
             )
 
         weights_url = api_data["weights"]["model"]
-        filename = weights_url.split("?")[0].split("/")[-1]
+        base_url = weights_url.split("?")[0]
+        filename = base_url.split("/")[-1]
         assert filename.endswith("tar.gz")
         stream_url_to_cache(
             url=weights_url,
EOF
@@ -248,9 +248,14 @@
for file_name in files_to_download:
weights_url = weights[file_name]
t1 = perf_counter()
filename = weights_url.split("?")[0].split("/")[-1]
# Sanitize filename extraction: strip query params and redact logging
base_url = weights_url.split("?")[0]
filename = base_url.split("/")[-1]
if filename.endswith(".npz"):
continue
# Never log URLs including query params!
safe_log_url = base_url # Query parameters (with keys) are stripped
# Use only the sanitized filename/base_url for logging
stream_url_to_cache(
url=weights_url,
filename=filename,
@@ -380,7 +383,8 @@
)

weights_url = api_data["weights"]["model"]
filename = weights_url.split("?")[0].split("/")[-1]
base_url = weights_url.split("?")[0]
filename = base_url.split("/")[-1]
assert filename.endswith("tar.gz")
stream_url_to_cache(
url=weights_url,
Copilot is powered by AI and may make mistakes. Always verify output.
downloaded += len(chunk)
if total_size:
percent = (downloaded / total_size) * 100
print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 14 hours ago

Sensitive data—such as API keys or service secrets—must not be written to logs or console outputs. In this instance, the download progress is printed including the entire filename, which may have been constructed from a URL containing sensitive data as query strings. The correct and secure solution is to ensure that only the sanitized basename of the file (i.e., no directory structure, no query string, no sensitive parts) is output as part of the progress message.

General fix:

  • Always log only the filename's basename, never anything from the query string or other URL components.
  • Scrub potential query parameters from the filename before printing.
  • Replace filename in the print statement with a sanitized version, ensuring no sensitive values (like query strings, api_key, service_secret, etc.) are included.

In detail:

  • Edit the download print statements to use only the path-part of the filename and exclude the query string. This can be done by:
    • Using Python's os.path.basename() to extract just the filename.
    • Splitting on ? and taking the first chunk to discard the query string, if present.
  • Optionally, define a small utility function within the same file to clean filenames for log output, if such cleaning is not already handled upstream.
  • Apply this fix to all download progress print statements in the affected file.

Edits needed:

  • In inference/core/roboflow_api.py, inside _stream_url_to_cache, replace every usage of filename in a print statement with a sanitized version that strips out both any path components (using os.path.basename) and removes any trailing query params (by splitting on ?).
  • Add an import for os if it is not present in the file or not in scope in the function.
  • Make sure the new code is only modifying log outputs, and does not affect actual file system writes.
Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -983,6 +983,8 @@
     downloaded = 0
 
     try:
+        # Sanitize filename for logging (remove directories and query params)
+        sanitized_filename = os.path.basename(filename).split('?', 1)[0]
         with open(temp_file_path, 'wb') as f:
             for chunk in response.iter_content(chunk_size=chunk_size):
                 if chunk:
@@ -990,9 +992,9 @@
                     downloaded += len(chunk)
                     if total_size:
                         percent = (downloaded / total_size) * 100
-                        print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
+                        print(f"Downloading {sanitized_filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
                     else:
-                        print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
+                        print(f"Downloading {sanitized_filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
                     if computed_md5 is not None:
                         computed_md5.update(chunk)
 
EOF
@@ -983,6 +983,8 @@
downloaded = 0

try:
# Sanitize filename for logging (remove directories and query params)
sanitized_filename = os.path.basename(filename).split('?', 1)[0]
with open(temp_file_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
@@ -990,9 +992,9 @@
downloaded += len(chunk)
if total_size:
percent = (downloaded / total_size) * 100
print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
print(f"Downloading {sanitized_filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
else:
print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
print(f"Downloading {sanitized_filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
if computed_md5 is not None:
computed_md5.update(chunk)

Copilot is powered by AI and may make mistakes. Always verify output.
percent = (downloaded / total_size) * 100
print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
else:
print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 14 hours ago

General Fix:
Avoid logging any potentially sensitive information, particularly filenames derived from URLs which may contain API keys, tokens, or secrets as query parameters. This means we should avoid printing the full filename if it is (or may be) tainted, or at least sanitize it to prevent secrets from being written to logs.

Best Fix for Provided Code:

  • In inference/core/roboflow_api.py, within _stream_url_to_cache, replace the usage of filename in the progress print with either:
    • nothing (just log generic progress, omitting filename altogether), or
    • a sanitized filename, stripped of any URL parameters and possibly truncated to just a base name, explicitly ensuring that no secrets embedded in the filename are logged.
  • Consider replacing the use of print with a logger progress statement (using logger at info/debug level, if such a progress bar isn't essential to users).
  • If providing the filename adds genuine user value (e.g., to show what file is being downloaded), then display just the file extension, or a double-checked sanitized value, but omit information that could include secrets/tokens.

Implementation Steps:

  • Only change the print lines using filename within _stream_url_to_cache in inference/core/roboflow_api.py.
  • Replace filename with a constant value (e.g., "file") or, if feasible, with a sanitized basename using os.path.basename(filename), provided that basename(filename) can never legitimately include a secret; otherwise, omit.
  • Optionally, explain with a comment.

Suggested changeset 1
inference/core/roboflow_api.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/inference/core/roboflow_api.py b/inference/core/roboflow_api.py
--- a/inference/core/roboflow_api.py
+++ b/inference/core/roboflow_api.py
@@ -990,9 +990,10 @@
                     downloaded += len(chunk)
                     if total_size:
                         percent = (downloaded / total_size) * 100
-                        print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
+                        # Avoid logging potentially sensitive filename (which may contain secrets or API keys)
+                        print(f"Downloading file: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
                     else:
-                        print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
+                        print(f"Downloading file: {downloaded / 1024 / 1024:.1f} MB", end="\r")
                     if computed_md5 is not None:
                         computed_md5.update(chunk)
 
EOF
@@ -990,9 +990,10 @@
downloaded += len(chunk)
if total_size:
percent = (downloaded / total_size) * 100
print(f"Downloading {filename}: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
# Avoid logging potentially sensitive filename (which may contain secrets or API keys)
print(f"Downloading file: {percent:.1f}% ({downloaded / 1024 / 1024:.1f}/{total_size / 1024 / 1024:.1f} MB)", end="\r")
else:
print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
print(f"Downloading file: {downloaded / 1024 / 1024:.1f} MB", end="\r")
if computed_md5 is not None:
computed_md5.update(chunk)

Copilot is powered by AI and may make mistakes. Always verify output.
else:
print(f"Downloading {filename}: {downloaded / 1024 / 1024:.1f} MB", end="\r")
if computed_md5 is not None:
computed_md5.update(chunk)

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (password)
is used in a hashing algorithm (MD5) that is insecure for password hashing, since it is not a computationally expensive hash function.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.
Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI about 14 hours ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants