Skip to content

fix: firewall log parser incorrectly skipping blocked requests#4781

Merged
Mossaka merged 2 commits intomainfrom
fix/firewall-log-parser-blocked-requests
Nov 25, 2025
Merged

fix: firewall log parser incorrectly skipping blocked requests#4781
Mossaka merged 2 commits intomainfrom
fix/firewall-log-parser-blocked-requests

Conversation

@Mossaka
Copy link
Copy Markdown
Collaborator

@Mossaka Mossaka commented Nov 25, 2025

Problem

The firewall log parser was silently skipping all blocked requests, causing job summaries to show "No blocked requests detected" even when the firewall was actively blocking domains.

Root cause: The parseFirewallLogLine() function used overly-strict regex validation that rejected valid Squid log entries for blocked requests. Specifically, blocked requests have -:- in the destIpPort field (no backend resolved), but the regex only accepted - or IP:port format.

Evidence

From the firewall escape test run (https://github.com/githubnext/gh-aw/actions/runs/19682136498):

  • Squid access.log showed multiple blocked requests (example.com, google.com, amazon.com, microsoft.com)
  • All had 403 TCP_DENIED status with -:- in destIpPort field
  • Job summary incorrectly reported: "✅ No blocked requests detected"

Solution

Simplified parseFirewallLogLine() by removing fragile regex validations:

  • Now only validates timestamp (essential for log format detection)
  • Relies on isRequestAllowed() for robust classification logic
  • The permissive parsing is safe because isRequestAllowed() defaults to "denied" for unknown patterns

Impact

  • Before: Job summaries always showed "No blocked requests detected" even when firewall was blocking domains
  • After: Blocked domains are correctly detected and reported in job summaries
  • Code change: -751 lines of fragile validation code

Testing

Verified both formats parse correctly:

  • ✅ Allowed: 1764100122.567 172.30.0.20:36586 api.github.com:443 140.82.116.6:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT
  • ✅ Blocked: 1764100149.267 172.30.0.20:35642 example.com:443 -:- 1.1 CONNECT 403 TCP_DENIED:HIER_NONE

Reviewed by subagent - confirmed changes are correct and safe.

Files Changed

  • pkg/workflow/js/parse_firewall_logs.cjs (source)
  • 18 auto-generated .lock.yml workflow files

Fixes the issue discovered in #4769

The parseFirewallLogLine() function was using overly-strict regex
validation that rejected valid Squid log entries for blocked requests.

Root cause: Blocked requests have "-:-" in the destIpPort field (no
backend resolved), but the regex only accepted "-" or "IP:port" format.

Changes:
- Simplified parseFirewallLogLine() to remove fragile regex validations
- Now only validates timestamp (essential for log format detection)
- Relies on isRequestAllowed() for robust classification logic

Impact:
- Before: Job summaries showed "No blocked requests detected" even when
  firewall was blocking domains
- After: Blocked domains (example.com, google.com, etc.) are correctly
  detected and reported

The permissive parsing is safe because isRequestAllowed() provides
secondary validation and defaults to "denied" for unknown patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings November 25, 2025 21:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where the firewall log parser was incorrectly skipping all blocked requests. The root cause was overly-strict regex validation that rejected valid Squid log entries where blocked requests have -:- in the destIpPort field (indicating no backend was resolved). The solution simplifies the parsing logic to only validate the timestamp, relying on the existing isRequestAllowed() function for robust request classification.

Key changes:

  • Removed fragile field-level regex validations (client IP, domain, dest IP, status, decision)
  • Retained essential validation (timestamp format, minimum field count)
  • Changed userAgent parsing to normalize empty strings to "-"

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

File Description
pkg/workflow/js/parse_firewall_logs.cjs Simplified parseFirewallLogLine() by removing field-specific regex validations and relying on timestamp + field count checks
.github/workflows/*.lock.yml (18 files) Auto-generated workflow files reflecting the bundled JavaScript changes from the source file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/workflow/js/parse_firewall_logs.cjs
Comment thread pkg/workflow/js/parse_firewall_logs.cjs
Signed-off-by: Jiaxiao (mossaka) Zhou <duibao55328@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 25, 2025

✅ Agentic Changeset Generator completed successfully.

@Mossaka Mossaka merged commit b4b6940 into main Nov 25, 2025
54 of 77 checks passed
@Mossaka Mossaka deleted the fix/firewall-log-parser-blocked-requests branch November 25, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants