Add PR-Agent + Qodo Merge dual review workflows#47
Add PR-Agent + Qodo Merge dual review workflows#47bong-water-water-bong wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85f3920535
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pr_agent_job: | ||
| name: PR-Agent (DeepSeek) | ||
| runs-on: ubuntu-latest | ||
| if: ${{ github.event.sender.type != 'Bot' && secrets.DEEPSEEK_API_KEY != '' }} |
There was a problem hiding this comment.
Move secret check out of the job condition
GitHub's Actions docs for using secrets state that secrets cannot be directly referenced in if: conditionals, so this job-level expression is rejected before the workflow can decide whether DEEPSEEK_API_KEY is present. The PR-Agent workflow therefore won't run on PRs as added here; pass the secret into an env var and gate a step instead, while keeping the bot-sender check at job level.
Useful? React with 👍 / 👎.
| types: [opened, synchronize, reopened, ready_for_review] | ||
| jobs: | ||
| qodo-merge: | ||
| if: ${{ secrets.QODO_API_KEY != "" }} |
There was a problem hiding this comment.
Move secret check out of the job condition
GitHub's Actions docs for using secrets state that secrets cannot be directly referenced in if: conditionals, so this job-level expression is invalid when the workflow is parsed. As a result the Qodo review job cannot be skipped cleanly based on QODO_API_KEY; put the secret in an env var and apply the condition at a step or wrapper-script level.
Useful? React with 👍 / 👎.
| uint64_t arr_len; f.read(reinterpret_cast<char*>(&arr_len), sizeof(arr_len)); | ||
| for (uint64_t j = 0; j < arr_len; j++) { | ||
| if (arr_type == 8) read_string(f); // skip array strings for now | ||
| else { uint64_t dummy; f.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)); } |
There was a problem hiding this comment.
Skip GGUF arrays by element size
Standard GGUF files include non-string metadata arrays such as tokenizer scores and token types whose elements are 4-byte floats/ints; this branch advances 8 bytes for every non-string array element regardless of arr_type. When such an array is present, the stream is left past the real metadata boundary and read_tensor_infos() consumes garbage, so loading typical GGUF models fails before reaching tensor data. Skip or parse array elements according to the GGUF value type size.
Useful? React with 👍 / 👎.
| std::vector<mx::float16_t> fp16_data(static_cast<size_t>(num_elements)); | ||
| std::vector<float> float_buf(static_cast<size_t>(num_elements)); | ||
|
|
||
| const uint8_t* tensor_data = file_data.data() + ti.offset; |
There was a problem hiding this comment.
Add GGUF data base to tensor offsets
GGUF tensor offsets are relative to the aligned tensor-data section, not to byte 0 of the file. Using file_data.data() + ti.offset makes every tensor read from the header/tensor-info area instead of its payload (or from the wrong tensor), so any .gguf load that reaches this path dequantizes corrupt weights. Compute the aligned data-section start after the tensor infos and add it to ti.offset before reading.
Useful? React with 👍 / 👎.
| cfg["num_attention_heads"] = get_int(p + "attention.head_count", 32); | ||
| cfg["num_key_value_heads"] = get_int(p + "attention.head_count_kv", | ||
| cfg["num_attention_heads"].get<int>()); | ||
| cfg["head_dim"] = get_int(p + "attention.head_dim", 0); |
There was a problem hiding this comment.
Avoid zeroing GGUF head dimensions
For GGUF models that omit *.attention.head_dim (common for llama-family GGUF metadata, where it is derived from embedding size and head count), this writes head_dim: 0 into the synthesized config. LlamaConfiguration::resolved_head_dim() then uses the present value 0 instead of deriving a dimension, so attention projections/reshapes are built with zero-width heads and the model cannot run. Only set head_dim when the metadata key exists, or derive it from hidden_size / num_attention_heads.
Useful? React with 👍 / 👎.
| weights.insert_or_assign(name, ait->second); | ||
| weights.erase(ait); |
There was a problem hiding this comment.
Remap quantization sidecars with alternate weights
When a quantized checkpoint needs one of these alternate-name remaps, this moves only the .weight tensor after register_quantized_weights() has already run. The matching .scales/.biases stay under the old prefix and never get registered for the model member pointer, so linear_forward() later treats the packed uint32 weight as a dense matrix and returns corrupt outputs. Apply name remaps before quantization registration or move/register the sidecars together with the weight.
Useful? React with 👍 / 👎.
| {"qwen3_moe_base", "qwen3_moe"}, | ||
| {"gemma3", "gemma3_text"}, | ||
| {"olmo", "llama"}, // OLMo-1 uses standard Llama-like architecture | ||
| {"gemma4_unified", "gemma4"}, |
There was a problem hiding this comment.
Point Gemma4 aliases at a registered loader
llm_loaders() registers gemma4_text but no gemma4, so a config with model_type: gemma4_unified hits this alias and still leaves it == loaders.end(), after which the generic Llama fallback can be selected for Gemma4-shaped configs. That bypasses Gemma4Model and loads incompatible weights; map this alias to gemma4_text or register a gemma4 loader.
Useful? React with 👍 / 👎.
| if (!has_weights && fs::exists(cache_path + "/model.safetensors.index.json")) { | ||
| has_weights = true; |
There was a problem hiding this comment.
Validate sharded safetensors before treating cache complete
A cache directory containing config.json plus only model.safetensors.index.json is returned as complete even if none of the shard files were downloaded, which can happen after an interrupted or failed previous download. The next load then fails with missing shard files and snapshot_download() never attempts to refill the cache; parse the index and require its referenced shards to exist before returning.
Useful? React with 👍 / 👎.
| local gpu_bus | ||
| gpu_bus=$(get_gpu_bus) | ||
| > "$outfile" | ||
| while kill -0 "$MON_PID" 2>/dev/null; do |
There was a problem hiding this comment.
Keep the GPU monitor alive during benchmarks
monitor_gpu is backgrounded before the caller assigns MON_PID, so the background subshell sees an empty value and exits this loop immediately. The benchmark then reports N/A or zeroed GPU utilization/VRAM samples for the main runs; pass the workload PID into the monitor (or start the workload in the background first) instead of relying on a variable set after the fork.
Useful? React with 👍 / 👎.
| local label_tpl | ||
| local env_arg |
There was a problem hiding this comment.
Remove local declarations from top-level loop
This PCIe-analysis loop runs at top level, not inside a function, so Bash reports local: can only be used in a function and, with set -e, exits before the side-by-side comparison and final summary are produced. Use plain assignments here or wrap the section in a function.
Useful? React with 👍 / 👎.
…sdk#48) Co-Authored-By: Claude <noreply@anthropic.com>
85f3920 to
0440a57
Compare
Summary
Adds two GitHub Actions workflows for automated PR review on this repo:
PR-Agent (DeepSeek)
Qodo Merge
Required secrets (to be set in repo settings)
Both workflows include proper permission scoping and bot-sender filtering.