Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions .github/workflows/sync-docs-cn-to-en.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
name: Sync Docs Changes from ZH PR to EN PR

on:
workflow_dispatch:
inputs:
source_pr_url:
description: 'Source PR URL (Chinese docs repository)'
required: true
type: string
default: ''
target_pr_url:
description: 'Target PR URL (English docs repository)'
required: true
type: string
default: ''
ai_provider:
description: 'AI Provider to use for translation'
required: false
type: choice
options:
- deepseek
- gemini
default: 'gemini'

Comment on lines +3 to +24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Declare least-privilege permissions for GITHUB_TOKEN.

Explicitly set permissions to enable contents and PR writes; current default may be read-only.

Apply:

 on:
   workflow_dispatch:
     inputs:
       source_pr_url:
         description: 'Source PR URL (Chinese docs repository)'
         required: true
         type: string
         default: ''
@@
         default: 'gemini'
 
+permissions:
+  contents: write
+  pull-requests: write
🤖 Prompt for AI Agents
.github/workflows/sync-docs-cn-to-en.yml around lines 3 to 24: the workflow
lacks an explicit permissions block for GITHUB_TOKEN, which can default to
read-only; add a top-level permissions section granting least-privilege write
access needed for the job (e.g., contents: write and pull-requests: write) so
the workflow can update repository contents and create/update PRs, ensuring no
broader permissions are granted.

jobs:
sync-docs:
runs-on: ubuntu-latest

steps:
- name: Checkout current repository
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
Comment on lines +37 to +39
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update setup-python to v5 (actionlint failure).

Use actions/setup-python@v5 to avoid runner errors.

Apply:

-      - name: Set up Python
-        uses: actions/setup-python@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
         with:
           python-version: '3.9'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.9'
🧰 Tools
🪛 actionlint (1.7.7)

37-37: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
.github/workflows/sync-docs-cn-to-en.yml around lines 37 to 39: the workflow
pins actions/setup-python@v4 which causes actionlint/runner errors; update the
workflow to use actions/setup-python@v5 by changing the action reference to
actions/setup-python@v5 and ensure the existing with: python-version setting
remains unchanged (no other logic changes needed).


- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r scripts/translate_doc_pr/requirements.txt

- name: Extract PR information
id: extract_info
run: |
# Extract source repo info
SOURCE_URL="${{ github.event.inputs.source_pr_url }}"
SOURCE_OWNER=$(echo $SOURCE_URL | cut -d'/' -f4)
SOURCE_REPO=$(echo $SOURCE_URL | cut -d'/' -f5)
SOURCE_PR=$(echo $SOURCE_URL | cut -d'/' -f7)

# Extract target repo info
TARGET_URL="${{ github.event.inputs.target_pr_url }}"
TARGET_OWNER=$(echo $TARGET_URL | cut -d'/' -f4)
TARGET_REPO=$(echo $TARGET_URL | cut -d'/' -f5)
TARGET_PR=$(echo $TARGET_URL | cut -d'/' -f7)

echo "source_owner=${SOURCE_OWNER}" >> $GITHUB_OUTPUT
echo "source_repo=${SOURCE_REPO}" >> $GITHUB_OUTPUT
echo "source_pr=${SOURCE_PR}" >> $GITHUB_OUTPUT
echo "target_owner=${TARGET_OWNER}" >> $GITHUB_OUTPUT
echo "target_repo=${TARGET_REPO}" >> $GITHUB_OUTPUT
echo "target_pr=${TARGET_PR}" >> $GITHUB_OUTPUT

echo "Source: ${SOURCE_OWNER}/${SOURCE_REPO}#${SOURCE_PR}"
echo "Target: ${TARGET_OWNER}/${TARGET_REPO}#${TARGET_PR}"

- name: Get target PR branch info
id: target_branch
run: |
# Get target PR branch name
TARGET_BRANCH=$(curl -s \
-H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}/pulls/${{ steps.extract_info.outputs.target_pr }}" \
| jq -r '.head.ref')

echo "target_branch=${TARGET_BRANCH}" >> $GITHUB_OUTPUT
echo "Target branch: ${TARGET_BRANCH}"

Comment on lines +71 to +83
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Cross-repo PRs and forks: clone/push the PR head repository using a PAT; current flow will fail.

Issues:

  • You only fetch .head.ref but clone base repo; this breaks for forked PRs (branch won't exist in base).
  • GITHUB_TOKEN cannot push to another repo; use a PAT (or GitHub App) with contents:write on the head repo.
  • Commenting on the target PR also needs a token authorized on that repo.

Fix by retrieving head repo full name and cloning/pushing to it with a PAT, and by using the same PAT for API calls.

Suggested patch:

       - name: Get target PR branch info
         id: target_branch
         run: |
           # Get target PR branch name
-          TARGET_BRANCH=$(curl -s \
+          PR_JSON=$(curl -s \
             -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-            -H "Accept: application/vnd.github.v3+json" \
+            -H "Accept: application/vnd.github+json" \
             "https://api.github.com/repos/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}/pulls/${{ steps.extract_info.outputs.target_pr }}" \
-            | jq -r '.head.ref')
+          )
+          TARGET_BRANCH=$(echo "$PR_JSON" | jq -r '.head.ref')
+          TARGET_HEAD_REPO=$(echo "$PR_JSON" | jq -r '.head.repo.full_name')
+          echo "target_branch=${TARGET_BRANCH}" >> $GITHUB_OUTPUT
+          echo "target_head_repo=${TARGET_HEAD_REPO}" >> $GITHUB_OUTPUT
-          echo "target_branch=${TARGET_BRANCH}" >> $GITHUB_OUTPUT
-          echo "Target branch: ${TARGET_BRANCH}"
+          echo "Target branch: ${TARGET_BRANCH}"
+          echo "Head repo: ${TARGET_HEAD_REPO}"

       - name: Clone target repository
         run: |
-          # Clone target repository with the PR branch
-          git clone https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}.git target_repo
+          # Clone PR head repository (supports forks) with a PAT
+          git clone https://x-access-token:${{ secrets.SYNC_REPO_TOKEN }}@github.com/${{ steps.target_branch.outputs.target_head_repo }}.git target_repo
           cd target_repo
-          git checkout ${{ steps.target_branch.outputs.target_branch }}
+          git fetch origin ${{ steps.target_branch.outputs.target_branch }}
+          git checkout -B ${{ steps.target_branch.outputs.target_branch }} origin/${{ steps.target_branch.outputs.target_branch }}
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"

       - name: Commit and push changes
         run: |
           cd target_repo
           git add .
           if git diff --staged --quiet; then
             echo "No changes to commit"
           else
             git commit -m "Auto-sync: Update English docs from Chinese PR ${{ github.event.inputs.source_pr_url }}
 
             Synced from: ${{ github.event.inputs.source_pr_url }}
             Target PR: ${{ github.event.inputs.target_pr_url }}
             AI Provider: ${{ github.event.inputs.ai_provider }}
 
             Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>"
 
-            git push origin ${{ steps.target_branch.outputs.target_branch }}
+            git push https://x-access-token:${{ secrets.SYNC_REPO_TOKEN }}@github.com/${{ steps.target_branch.outputs.target_head_repo }}.git ${{ steps.target_branch.outputs.target_branch }}
             echo "Changes pushed to target PR branch: ${{ steps.target_branch.outputs.target_branch }}"
           fi

       - name: Add comment to target PR
         run: |
           # Add a comment to the target PR about the sync
-          curl -X POST \
-            -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
+          curl -X POST \
+            -H "Authorization: token ${{ secrets.SYNC_REPO_TOKEN }}" \
             -H "Accept: application/vnd.github.v3+json" \
             "https://api.github.com/repos/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}/issues/${{ steps.extract_info.outputs.target_pr }}/comments" \
             -d "{
               \"body\": \"🤖 **Auto-sync completed**\\n\\n📥 **Source PR**: ${{ github.event.inputs.source_pr_url }}\\n🎯 **Target PR**: ${{ github.event.inputs.target_pr_url }}\\n✅ English documentation has been updated based on Chinese documentation changes.\\n\\n_This comment was generated automatically by the sync workflow._\"
             }"

Also add a repo secret SYNC_REPO_TOKEN (PAT) with repo:write access on the head repo (or use a GitHub App).

Also applies to: 84-93, 125-134

🤖 Prompt for AI Agents
.github/workflows/sync-docs-cn-to-en.yml lines 71-83 (and similarly update
84-93, 125-134): the workflow currently reads only .head.ref and clones/pushes
the base repo using GITHUB_TOKEN which fails for forked PRs and prevents pushing
to the head repo; change the flow to fetch the head repo full_name (owner/repo)
and head.ref from the PR API, switch cloning/push operations to use that head
repo full_name, and replace GITHUB_TOKEN with a PAT stored in a repo secret
(e.g. SYNC_REPO_TOKEN with repo:write/contents scope) for git clone/push and for
any API calls (comments or other PR updates) so the token has permission on the
head repo; ensure the workflow uses the head repo URL when adding remotes and
pushing branches and update the other referenced blocks (84-93, 125-134) to use
the same head-repo + PAT approach.

- name: Clone target repository
run: |
# Clone target repository with the PR branch
git clone https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}.git target_repo
cd target_repo
git checkout ${{ steps.target_branch.outputs.target_branch }}
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"

- name: Run sync script
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DEEPSEEK_API_TOKEN: ${{ secrets.DEEPSEEK_API_TOKEN }}
GEMINI_API_TOKEN: ${{ secrets.GEMINI_API_TOKEN }}
SOURCE_PR_URL: ${{ github.event.inputs.source_pr_url }}
TARGET_PR_URL: ${{ github.event.inputs.target_pr_url }}
AI_PROVIDER: ${{ github.event.inputs.ai_provider }}
TARGET_REPO_PATH: ${{ github.workspace }}/target_repo
run: |
cd scripts/translate_doc_pr
python main_workflow.py

- name: Commit and push changes
run: |
cd target_repo
git add .
if git diff --staged --quiet; then
echo "No changes to commit"
else
git commit -m "Auto-sync: Update English docs from Chinese PR ${{ github.event.inputs.source_pr_url }}

Synced from: ${{ github.event.inputs.source_pr_url }}
Target PR: ${{ github.event.inputs.target_pr_url }}
AI Provider: ${{ github.event.inputs.ai_provider }}

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>"

git push origin ${{ steps.target_branch.outputs.target_branch }}
echo "Changes pushed to target PR branch: ${{ steps.target_branch.outputs.target_branch }}"
fi

- name: Add comment to target PR
run: |
# Add a comment to the target PR about the sync
curl -X POST \
-H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/${{ steps.extract_info.outputs.target_owner }}/${{ steps.extract_info.outputs.target_repo }}/issues/${{ steps.extract_info.outputs.target_pr }}/comments" \
-d "{
\"body\": \"🤖 **Auto-sync completed**\\n\\n📥 **Source PR**: ${{ github.event.inputs.source_pr_url }}\\n🎯 **Target PR**: ${{ github.event.inputs.target_pr_url }}\\n✅ English documentation has been updated based on Chinese documentation changes.\\n\\n_This comment was generated automatically by the sync workflow._\"
}"
22 changes: 22 additions & 0 deletions scripts/translate_doc_pr/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env python3
"""
Auto-Sync PR Changes - Refactored Modular Version

This package contains the refactored version of the auto-sync-pr-changes script,
split into logical modules for better maintainability and testing.

Modules:
- pr_analyzer: PR analysis, diff parsing, content getting, hierarchy building
- section_matcher: Section matching (direct matching + AI matching)
- file_adder: New file processing and translation
- file_deleter: Deleted file processing
- file_updater: Updated file processing and translation
- toc_processor: TOC file special processing
- main: Main orchestration function
"""

# Import main functionality for easy access
from main import main

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This import statement is incorrect and will cause a runtime error. There is no top-level main module in the standard library or in this project. Based on the project structure, you likely intended to import from main_workflow.py within this package. This should be a relative import: from .main_workflow import main.

This same incorrect import pattern (from main import ...) appears in several other files in this package and needs to be fixed throughout for the package to be importable and usable.


# Make main function available at package level
__all__ = ["main"]
Comment on lines +19 to +22
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix package export import path

import scripts.translate_doc_pr currently explodes with ModuleNotFoundError: No module named 'main' because the package init tries to do an absolute from main import main. The callable lives in main_workflow.py, so the package entry point is unusable until we point the import at the correct module.

-from main import main
+from .main_workflow import main
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from main import main
# Make main function available at package level
__all__ = ["main"]
from .main_workflow import main
# Make main function available at package level
__all__ = ["main"]
🤖 Prompt for AI Agents
In scripts/translate_doc_pr/__init__.py around lines 19 to 22, the package init
does an absolute import from a non-existent module ("main"), causing
ModuleNotFoundError; replace the import with a relative import that points to
the actual callable (main_workflow.py) — i.e., import the main function using a
package-relative import and keep __all__ as ["main"] so importing
scripts.translate_doc_pr exposes the correct entrypoint.

193 changes: 193 additions & 0 deletions scripts/translate_doc_pr/file_adder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
"""
File Adder Module
Handles processing and translation of newly added files
"""

import os
import re
import json
import threading
from github import Github
from openai import OpenAI

# Thread-safe printing
print_lock = threading.Lock()

def thread_safe_print(*args, **kwargs):
with print_lock:
print(*args, **kwargs)
Comment on lines +16 to +18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The thread_safe_print function is duplicated across multiple files in this package (including file_deleter.py, file_updater.py, main_workflow.py, etc.). To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this function should be defined once in a shared utility module and imported where needed.


def create_section_batches(file_content, max_lines_per_batch=200):
"""Create batches of file content for translation, respecting section boundaries"""
lines = file_content.split('\n')

# Find all section headers
section_starts = []
for i, line in enumerate(lines):
line = line.strip()
if line.startswith('#'):
match = re.match(r'^(#{1,10})\s+(.+)', line)
if match:
section_starts.append(i + 1) # 1-based line numbers

# If no sections found, just batch by line count
if not section_starts:
batches = []
for i in range(0, len(lines), max_lines_per_batch):
batch_lines = lines[i:i + max_lines_per_batch]
batches.append('\n'.join(batch_lines))
return batches

# Create batches respecting section boundaries
batches = []
current_batch_start = 0

for i, section_start in enumerate(section_starts):
section_start_idx = section_start - 1 # Convert to 0-based

# Check if adding this section would exceed the line limit
if (section_start_idx - current_batch_start) > max_lines_per_batch:
# Close current batch at the previous section boundary
if current_batch_start < section_start_idx:
batch_lines = lines[current_batch_start:section_start_idx]
batches.append('\n'.join(batch_lines))
current_batch_start = section_start_idx

# If this is the last section, or the next section would create a batch too large
if i == len(section_starts) - 1:
# Add remaining content as final batch
batch_lines = lines[current_batch_start:]
batches.append('\n'.join(batch_lines))
else:
next_section_start = section_starts[i + 1] - 1 # 0-based
if (next_section_start - current_batch_start) > max_lines_per_batch:
# Close current batch at current section boundary
batch_lines = lines[current_batch_start:section_start_idx]
if batch_lines: # Only add non-empty batches
batches.append('\n'.join(batch_lines))
current_batch_start = section_start_idx

# Clean up any empty batches
batches = [batch for batch in batches if batch.strip()]

return batches
Comment on lines +20 to +73

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic in create_section_batches is quite complex and hard to follow, especially with the lookahead checks for the next section. This complexity increases the risk of bugs and makes the function difficult to maintain.

Consider refactoring this to a simpler, more straightforward algorithm. A simpler approach could be:

  1. Initialize an empty current_batch and a list of batches.
  2. Iterate through the sections (or lines if no sections).
  3. If adding the current section/line to current_batch does not exceed max_lines_per_batch, add it.
  4. Otherwise, add current_batch to batches, and start a new current_batch with the current section/line.
  5. After the loop, add the final current_batch if it's not empty.


def translate_file_batch(batch_content, ai_client, source_language="English", target_language="Chinese"):
"""Translate a single batch of file content using AI"""
if not batch_content.strip():
return batch_content

thread_safe_print(f" 🤖 Translating batch ({len(batch_content.split())} words)...")

prompt = f"""You are a professional technical writer. Please translate the following {source_language} content to {target_language}.

IMPORTANT INSTRUCTIONS:
1. Preserve ALL Markdown formatting (headers, links, code blocks, tables, etc.)
2. Do NOT translate:
- Code examples, SQL queries, configuration values
- Technical terms like "TiDB", "TiKV", "PD", API names, etc.
- File paths, URLs, and command line examples
- Variable names and system configuration parameters
3. Translate only the descriptive text and explanations
4. Maintain the exact structure and indentation
5. Keep all special characters and formatting intact

Content to translate:
{batch_content}

Please provide the translated content maintaining all formatting and structure."""

# Add token estimation
try:
from main import print_token_estimation
print_token_estimation(prompt, "File addition translation")
except ImportError:
# Fallback if import fails - use tiktoken
Comment on lines +100 to +105
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrong module import: print_token_estimation is in main_workflow.py, not main.

This always ImportError-falls back, hiding token stats.

Apply:

-    try:
-        from main import print_token_estimation
+    try:
+        from main_workflow import print_token_estimation
         print_token_estimation(prompt, "File addition translation")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Add token estimation
try:
from main import print_token_estimation
print_token_estimation(prompt, "File addition translation")
except ImportError:
# Fallback if import fails - use tiktoken
# Add token estimation
try:
from main_workflow import print_token_estimation
print_token_estimation(prompt, "File addition translation")
except ImportError:
# Fallback if import fails - use tiktoken
🤖 Prompt for AI Agents
In scripts/translate_doc_pr/file_adder.py around lines 100 to 105, the code
incorrectly imports print_token_estimation from module main (which causes an
ImportError and always triggers the fallback); update the import to load
print_token_estimation from main_workflow (from main_workflow import
print_token_estimation) and keep the existing try/except fallback behavior so
token estimation is used when available and falls back to tiktoken only if the
import or call fails.

try:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode(prompt)
actual_tokens = len(tokens)
char_count = len(prompt)
print(f" 💰 File addition translation")
print(f" 📝 Input: {char_count:,} characters")
print(f" 🔢 Actual tokens: {actual_tokens:,} (using tiktoken cl100k_base)")
except Exception:
# Final fallback to character approximation
estimated_tokens = len(prompt) // 4
char_count = len(prompt)
print(f" 💰 File addition translation")
print(f" 📝 Input: {char_count:,} characters")
print(f" 🔢 Estimated tokens: ~{estimated_tokens:,} (fallback: 4 chars/token approximation)")
Comment on lines +101 to +121

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block for token estimation is duplicated in several modules (file_updater.py, section_matcher.py, toc_processor.py). This duplicated code should be refactored into a single utility function in a shared module to avoid redundancy and make future changes easier.


try:
translated_content = ai_client.chat_completion(
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
thread_safe_print(f" ✅ Batch translation completed")
return translated_content

except Exception as e:
thread_safe_print(f" ❌ Batch translation failed: {e}")
return batch_content # Return original content if translation fails

def process_added_files(added_files, pr_url, github_client, ai_client, repo_config):
"""Process newly added files by translating and creating them in target repository"""
if not added_files:
thread_safe_print("\n📄 No new files to process")
return

thread_safe_print(f"\n📄 Processing {len(added_files)} newly added files...")

target_local_path = repo_config['target_local_path']
source_language = repo_config['source_language']
target_language = repo_config['target_language']

for file_path, file_content in added_files.items():
thread_safe_print(f"\n📝 Processing new file: {file_path}")

# Create target file path
target_file_path = os.path.join(target_local_path, file_path)
target_dir = os.path.dirname(target_file_path)

# Create directory if it doesn't exist
if not os.path.exists(target_dir):
os.makedirs(target_dir, exist_ok=True)
thread_safe_print(f" 📁 Created directory: {target_dir}")

# Check if file already exists
if os.path.exists(target_file_path):
thread_safe_print(f" ⚠️ Target file already exists: {target_file_path}")
continue

# Create section batches for translation
batches = create_section_batches(file_content, max_lines_per_batch=200)
thread_safe_print(f" 📦 Created {len(batches)} batches for translation")

# Translate each batch
translated_batches = []
for i, batch in enumerate(batches):
thread_safe_print(f" 🔄 Processing batch {i+1}/{len(batches)}")
translated_batch = translate_file_batch(
batch,
ai_client,
source_language,
target_language
)
translated_batches.append(translated_batch)

# Combine translated batches
translated_content = '\n'.join(translated_batches)

# Write translated content to target file
try:
with open(target_file_path, 'w', encoding='utf-8') as f:
f.write(translated_content)

thread_safe_print(f" ✅ Created translated file: {target_file_path}")

except Exception as e:
thread_safe_print(f" ❌ Error creating file {target_file_path}: {e}")

thread_safe_print(f"\n✅ Completed processing all new files")
45 changes: 45 additions & 0 deletions scripts/translate_doc_pr/file_deleter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""
File Deleter Module
Handles processing of deleted files and deleted sections
"""

import os
import threading
from github import Github

# Thread-safe printing
print_lock = threading.Lock()

def thread_safe_print(*args, **kwargs):
with print_lock:
print(*args, **kwargs)

def process_deleted_files(deleted_files, github_client, repo_config):
"""Process deleted files by removing them from target repository"""
if not deleted_files:
thread_safe_print("\n🗑️ No files to delete")
return

thread_safe_print(f"\n🗑️ Processing {len(deleted_files)} deleted files...")

target_local_path = repo_config['target_local_path']

for file_path in deleted_files:
thread_safe_print(f"\n🗑️ Processing deleted file: {file_path}")

# Create target file path
target_file_path = os.path.join(target_local_path, file_path)

# Check if file exists in target
if os.path.exists(target_file_path):
try:
os.remove(target_file_path)
thread_safe_print(f" ✅ Deleted file: {target_file_path}")
except Exception as e:
thread_safe_print(f" ❌ Error deleting file {target_file_path}: {e}")
else:
thread_safe_print(f" ⚠️ Target file not found: {target_file_path}")

thread_safe_print(f"\n✅ Completed processing deleted files")

# Section deletion logic moved to file_updater.py
Loading