-
Notifications
You must be signed in to change notification settings - Fork 0
⚡ Optimize GitHub deployment via concurrent async HTTP requests #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
16c9224
0454411
52e5a26
5b2da4d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,9 @@ on: | |
| types: | ||
| - opened | ||
|
|
||
| permissions: | ||
| issues: write | ||
|
|
||
| jobs: | ||
| auto-assign: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,7 +19,7 @@ | |||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||
| from typing import Any, Optional | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| import requests | ||||||||||||||||||||||||||
| import aiohttp | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| from youtube_extension.backend.deploy import deploy_project as _adapter_deploy | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -493,43 +493,45 @@ async def _create_github_repository(self, repo_name: str, project_config: dict[s | |||||||||||||||||||||||||
| "Accept": "application/vnd.github.v3+json" | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Get user info | ||||||||||||||||||||||||||
| user_response = requests.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||||||||||
| if user_response.status_code != 200: | ||||||||||||||||||||||||||
| raise Exception(f"Failed to get GitHub user info: {user_response.text}") | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| user_data = user_response.json() | ||||||||||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Create repository | ||||||||||||||||||||||||||
| repo_data = { | ||||||||||||||||||||||||||
| "name": repo_name, | ||||||||||||||||||||||||||
| "description": f"Generated by UVAI from YouTube tutorial - {project_config.get('title', 'Unknown')}", | ||||||||||||||||||||||||||
| "private": False, | ||||||||||||||||||||||||||
| "auto_init": True, | ||||||||||||||||||||||||||
| "has_issues": True, | ||||||||||||||||||||||||||
| "has_projects": True, | ||||||||||||||||||||||||||
| "has_wiki": False | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| response = requests.post( | ||||||||||||||||||||||||||
| "https://api.github.com/user/repos", | ||||||||||||||||||||||||||
| headers=headers, | ||||||||||||||||||||||||||
| json=repo_data | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| async with aiohttp.ClientSession() as session: | ||||||||||||||||||||||||||
| # Get user info | ||||||||||||||||||||||||||
| async with session.get("https://api.github.com/user", headers=headers) as user_response: | ||||||||||||||||||||||||||
| if user_response.status != 200: | ||||||||||||||||||||||||||
| error_text = await user_response.text() | ||||||||||||||||||||||||||
| raise Exception(f"Failed to get GitHub user info: {error_text}") | ||||||||||||||||||||||||||
| user_data = await user_response.json() | ||||||||||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if response.status_code not in [201, 422]: # 422 if repo already exists | ||||||||||||||||||||||||||
| raise Exception(f"Failed to create GitHub repository: {response.text}") | ||||||||||||||||||||||||||
| # Create repository | ||||||||||||||||||||||||||
| repo_data = { | ||||||||||||||||||||||||||
| "name": repo_name, | ||||||||||||||||||||||||||
| "description": f"Generated by UVAI from YouTube tutorial - {project_config.get('title', 'Unknown')}", | ||||||||||||||||||||||||||
| "private": False, | ||||||||||||||||||||||||||
| "auto_init": True, | ||||||||||||||||||||||||||
| "has_issues": True, | ||||||||||||||||||||||||||
| "has_projects": True, | ||||||||||||||||||||||||||
| "has_wiki": False | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if response.status_code == 422: | ||||||||||||||||||||||||||
| # Repository already exists, get its info | ||||||||||||||||||||||||||
| repo_response = requests.get(f"https://api.github.com/repos/{username}/{repo_name}", headers=headers) | ||||||||||||||||||||||||||
| if repo_response.status_code == 200: | ||||||||||||||||||||||||||
| repo_info = repo_response.json() | ||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
| raise Exception(f"Repository exists but can't access it: {repo_response.text}") | ||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
| repo_info = response.json() | ||||||||||||||||||||||||||
| async with session.post( | ||||||||||||||||||||||||||
| "https://api.github.com/user/repos", | ||||||||||||||||||||||||||
| headers=headers, | ||||||||||||||||||||||||||
| json=repo_data | ||||||||||||||||||||||||||
| ) as response: | ||||||||||||||||||||||||||
| if response.status not in [201, 422]: # 422 if repo already exists | ||||||||||||||||||||||||||
| error_text = await response.text() | ||||||||||||||||||||||||||
| raise Exception(f"Failed to create GitHub repository: {error_text}") | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if response.status == 422: | ||||||||||||||||||||||||||
| # Repository already exists, get its info | ||||||||||||||||||||||||||
| async with session.get(f"https://api.github.com/repos/{username}/{repo_name}", headers=headers) as repo_response: | ||||||||||||||||||||||||||
| if repo_response.status == 200: | ||||||||||||||||||||||||||
| repo_info = await repo_response.json() | ||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
| error_text = await repo_response.text() | ||||||||||||||||||||||||||
| raise Exception(f"Repository exists but can't access it: {error_text}") | ||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
| repo_info = await response.json() | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| return { | ||||||||||||||||||||||||||
| "repo_name": repo_name, | ||||||||||||||||||||||||||
|
|
@@ -549,53 +551,65 @@ async def _upload_to_github(self, project_path: str, repo_name: str) -> dict[str | |||||||||||||||||||||||||
| "Accept": "application/vnd.github.v3+json" | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Get user info | ||||||||||||||||||||||||||
| user_response = requests.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||||||||||
| user_data = user_response.json() | ||||||||||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| uploaded_files = [] | ||||||||||||||||||||||||||
| project_path_obj = Path(project_path) | ||||||||||||||||||||||||||
| async with aiohttp.ClientSession() as session: | ||||||||||||||||||||||||||
| # Get user info | ||||||||||||||||||||||||||
| async with session.get("https://api.github.com/user", headers=headers) as user_response: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| async with session.get("https://api.github.com/user", headers=headers) as user_response: | |
| async with session.get("https://api.github.com/user", headers=headers) as user_response: | |
| if user_response.status != 200: | |
| error_text = await user_response.text() | |
| raise Exception(f"GitHub API error fetching user info: {user_response.status} {error_text}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request to get user info is missing a status check. If the request to https://api.github.com/user fails, await user_response.json() might raise an exception, or accessing user_data["login"] will fail with a KeyError. This could lead to an unhandled exception. You should check user_response.status before attempting to parse the JSON, similar to how it's done in _create_github_repository.
async with session.get("https://api.github.com/user", headers=headers) as user_response:
if user_response.status != 200:
error_text = await user_response.text()
raise Exception(f"Failed to get GitHub user info: {error_text}")
user_data = await user_response.json()
username = user_data["login"]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The _upload_to_github function is missing an HTTP status check for the GitHub API response, which will cause a KeyError when the API returns an error (e.g., 401).
Severity: MEDIUM
Suggested Fix
Before calling await user_response.json(), check if user_response.status is 200. If it is not, read the response text and raise an exception with a descriptive error message, similar to the error handling implemented in the _create_github_repository function.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/youtube_extension/backend/deployment_manager.py#L556-L558
Potential issue: In the `_upload_to_github` function, the code makes a request to the
GitHub `/user` endpoint but does not check the HTTP status of the response before
attempting to parse it as JSON and access its keys. If the request fails due to an
invalid token, the GitHub API returns a non-200 status with a JSON body containing an
error message. The code will then attempt to access the `"login"` key on this error
object, which will raise a `KeyError` and crash the process with an unclear error
message instead of failing gracefully.
Did we get this right? 👍 / 👎 to inform future reviews.
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload_tasks accumulates one coroutine per file before any uploads start. For very large generated projects this can create a large in-memory list and delay the start of uploads until the directory walk completes. Consider scheduling uploads incrementally (e.g., create tasks as you iterate and await in bounded batches / use asyncio.TaskGroup with a semaphore) so memory stays bounded and uploads can begin earlier.
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload_single_file performs synchronous disk I/O (open(...).read()) inside an async coroutine. With concurrent uploads this can still block the event loop and reduce the benefit of switching to aiohttp (especially on slow disks / large files). Consider using aiofiles or asyncio.to_thread() for the file read (and base64 encoding if needed) to keep the upload path non-blocking.
| # Read file content | |
| with open(file_path, 'rb') as f: | |
| content = f.read() | |
| # Encode content | |
| encoded_content = base64.b64encode(content).decode('utf-8') | |
| # Read and encode file content in a background thread to avoid blocking the event loop | |
| def _read_and_encode(p: Path) -> str: | |
| with open(p, "rb") as f: | |
| return base64.b64encode(f.read()).decode("utf-8") | |
| encoded_content = await asyncio.to_thread(_read_and_encode, file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload_single_file function is vulnerable to Denial of Service via Memory Exhaustion. It reads entire files into memory using f.read(), which can lead to crashes with large or concurrent uploads. Additionally, the use of synchronous open() and f.read() blocks the asyncio event loop, negating aiohttp benefits. Implement file size limits, use streaming for content, and switch to aiofiles for asynchronous I/O to mitigate these issues.
async with aiofiles.open(file_path, 'rb') as f:\n content = await f.read()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _upload_to_github function is vulnerable to Potential Path Traversal and File Exfiltration. The project_path parameter is used without validation, allowing potential access and upload of sensitive files from arbitrary locations. It is critical to validate project_path (e.g., using Path.resolve() and checking against an expected base path) to prevent this. Additionally, the nested upload_single_file function is missing a -> None return type hint, violating style guide requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrary File Exfiltration via Symlink Following
The _upload_to_github function iterates through all files in the project directory using rglob("*") and uploads them to GitHub. It uses file_path.is_file() to identify files to upload. However, is_file() returns True for symbolic links that point to files. Since the project directory contains code generated by an AI and is subjected to a build process (npm run build) before the upload, a malicious build script (potentially generated via prompt injection) could create a symlink to a sensitive system file (e.g., /etc/passwd, server configuration files, or .env files). The _upload_to_github function will then read the content of the linked file and upload it to the user's GitHub repository, leading to arbitrary file exfiltration.
Remediation: Ensure that symbolic links are not followed during the file collection process. Use file_path.is_file() and not file_path.is_symlink() to filter files.
if file_path.is_file() and not file_path.is_symlink() and not file_path.name.startswith('.'):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new
aiohttprequests don’t set any client or per-request timeout. If GitHub stalls, these awaits can hang indefinitely and block the whole deployment workflow. Consider configuringaiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=...))(or passingtimeout=to individual calls) with a sensible total timeout for GitHub API operations.