Skip to content

Conversation

@fmhall
Copy link
Contributor

@fmhall fmhall commented Jan 14, 2026

No description provided.

@vercel
Copy link
Contributor

vercel bot commented Jan 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
x402scan Error Error Jan 14, 2026 10:03pm

const patterns = [
// name attribute with double-quoted content
new RegExp(
`<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nameOrProperty parameter is directly interpolated into regex patterns without escaping regex metacharacters, creating a potential regex injection vulnerability and correctness issue if dynamic meta tag names are ever used.

View Details
📝 Patch Details
diff --git a/apps/scan/src/services/scraper/metadata.ts b/apps/scan/src/services/scraper/metadata.ts
index 87f62a55..31b96ca4 100644
--- a/apps/scan/src/services/scraper/metadata.ts
+++ b/apps/scan/src/services/scraper/metadata.ts
@@ -4,6 +4,13 @@ interface Metadata {
   url?: string;
 }
 
+/**
+ * Escapes regex metacharacters to prevent regex injection and ensure literal matching
+ */
+const escapeRegex = (str: string): string => {
+  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\<!-- PATCH_DETAILS_PLACEHOLDER -->');
+};
+
 /**
  * Extracts the content from a meta tag
  * Uses separate patterns for double-quoted and single-quoted values to handle quotes in content
@@ -12,47 +19,51 @@ const extractMetaContent = (
   html: string,
   nameOrProperty: string
 ): string | null => {
+  // Escape regex metacharacters in the name/property to prevent regex injection
+  // and ensure exact matching of literal characters
+  const escapedNameOrProperty = escapeRegex(nameOrProperty);
+
   // Match meta tags with name or property attribute
   // Use separate patterns for each quote type to properly handle quotes in values
   const patterns = [
     // name attribute with double-quoted content
     new RegExp(
-      `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
+      `<meta\\s+[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
       'i'
     ),
     // name attribute with single-quoted content
     new RegExp(
-      `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
+      `<meta\\s+[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
       'i'
     ),
     // content before name with double quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["']`,
       'i'
     ),
     // content before name with single quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["']`,
       'i'
     ),
     // property attribute with double-quoted content
     new RegExp(
-      `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
+      `<meta\\s+[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
       'i'
     ),
     // property attribute with single-quoted content
     new RegExp(
-      `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
+      `<meta\\s+[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
       'i'
     ),
     // content before property with double quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["']`,
       'i'
     ),
     // content before property with single quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["']`,
       'i'
     ),
   ];

Analysis

Unescaped regex metacharacters in extractMetaContent() nameOrProperty parameter

What fails: extractMetaContent() in apps/scan/src/services/scraper/metadata.ts fails to match meta tags with names/properties containing regex metacharacters like [, ], *, +, ?, (, ), {, }, ^, $, |, ., or \. These characters are interpreted as regex operators instead of literals, preventing correct matching.

How to reproduce:

const escapeRegex = (str) => {
  return str.replace(/[.*+?^ 

Result: Returns null - unable to match the meta tag

Expected: Should return "first item" - the meta tag with brackets in the name should be matched literally

Fix applied: Added escapeRegex() helper function to escape regex metacharacters before interpolating into regex patterns, ensuring all 8 regex patterns in extractMetaContent() treat the parameter as literal text rather than regex operators.

attrName: string
): string | null => {
// Try double-quoted value first
let match = new RegExp(`${attrName}\\s*=\\s*"([^"]*)"`).exec(attrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attrName parameter is directly interpolated into regex patterns without escaping regex metacharacters, creating a potential regex injection vulnerability if dynamic attribute names are ever used.

View Details
📝 Patch Details
diff --git a/apps/scan/src/services/scraper/favicon.ts b/apps/scan/src/services/scraper/favicon.ts
index b8ce21c1..bbc2a3c1 100644
--- a/apps/scan/src/services/scraper/favicon.ts
+++ b/apps/scan/src/services/scraper/favicon.ts
@@ -65,6 +65,13 @@ const getIconPriority = (
   return 6;
 };
 
+/**
+ * Escapes regex metacharacters in a string to prevent regex injection
+ */
+const escapeRegexMetacharacters = (str: string): string => {
+  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\<!-- PATCH_DETAILS_PLACEHOLDER -->');
+};
+
 /**
  * Extracts an attribute value, handling both double and single quotes
  * This allows values to contain the opposite quote character (e.g., apostrophes in double-quoted values)
@@ -73,12 +80,14 @@ const extractAttributeValue = (
   attrs: string,
   attrName: string
 ): string | null => {
+  const escapedName = escapeRegexMetacharacters(attrName);
+
   // Try double-quoted value first
-  let match = new RegExp(`${attrName}\\s*=\\s*"([^"]*)"`).exec(attrs);
+  let match = new RegExp(`${escapedName}\\s*=\\s*"([^"]*)"`).exec(attrs);
   if (match?.[1] !== undefined) return match[1];
 
   // Try single-quoted value
-  match = new RegExp(`${attrName}\\s*=\\s*'([^']*)'`).exec(attrs);
+  match = new RegExp(`${escapedName}\\s*=\\s*'([^']*)'`).exec(attrs);
   if (match?.[1] !== undefined) return match[1];
 
   return null;
diff --git a/apps/scan/src/services/scraper/metadata.ts b/apps/scan/src/services/scraper/metadata.ts
index 87f62a55..4695886d 100644
--- a/apps/scan/src/services/scraper/metadata.ts
+++ b/apps/scan/src/services/scraper/metadata.ts
@@ -4,6 +4,13 @@ interface Metadata {
   url?: string;
 }
 
+/**
+ * Escapes regex metacharacters in a string to prevent regex injection
+ */
+const escapeRegexMetacharacters = (str: string): string => {
+  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\<!-- PATCH_DETAILS_PLACEHOLDER -->');
+};
+
 /**
  * Extracts the content from a meta tag
  * Uses separate patterns for double-quoted and single-quoted values to handle quotes in content
@@ -12,47 +19,49 @@ const extractMetaContent = (
   html: string,
   nameOrProperty: string
 ): string | null => {
+  const escaped = escapeRegexMetacharacters(nameOrProperty);
+
   // Match meta tags with name or property attribute
   // Use separate patterns for each quote type to properly handle quotes in values
   const patterns = [
     // name attribute with double-quoted content
     new RegExp(
-      `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
+      `<meta\\s+[^>]*name\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*"([^"]*)"`,
       'i'
     ),
     // name attribute with single-quoted content
     new RegExp(
-      `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
+      `<meta\\s+[^>]*name\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*'([^']*)'`,
       'i'
     ),
     // content before name with double quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${escaped}["']`,
       'i'
     ),
     // content before name with single quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${escaped}["']`,
       'i'
     ),
     // property attribute with double-quoted content
     new RegExp(
-      `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,
+      `<meta\\s+[^>]*property\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*"([^"]*)"`,
       'i'
     ),
     // property attribute with single-quoted content
     new RegExp(
-      `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`,
+      `<meta\\s+[^>]*property\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*'([^']*)'`,
       'i'
     ),
     // content before property with double quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${escaped}["']`,
       'i'
     ),
     // content before property with single quotes
     new RegExp(
-      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${nameOrProperty}["']`,
+      `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${escaped}["']`,
       'i'
     ),
   ];

Analysis

Regex Injection Vulnerability in extractAttributeValue() and extractMetaContent()

What fails: extractAttributeValue() in favicon.ts and extractMetaContent() in metadata.ts interpolate user-controllable parameters directly into RegExp patterns without escaping regex metacharacters. This causes:

  1. Regex metacharacters in attribute names to be misinterpreted as regex patterns rather than literal characters
  2. Incorrect matching behavior when attribute names contain metacharacters like ., *, +, [, ], ^, $, {, }, (, ), |, \

How to reproduce:

// Attribute name with dot (. matches any character in regex)
const attrs = 'data.name="value1" dataxname="value2"';
const result = extractAttributeValue(attrs, 'data.name'); // Should find "value1"
// Before fix: returns "value1" ✓ (but only by accident - . matches any character)
// Problem: If HTML had 'dataxname', it would also match (because . matches x)

// Attribute name with asterisk (* means zero or more in regex)
const attrs2 = 'data*name="test-value"';
const result2 = extractAttributeValue(attrs2, 'data*name');
// Before fix: returns null (incorrectly fails)
// After fix: returns "test-value" ✓

// Attribute name with plus (+ means one or more in regex)
const attrs3 = 'data+name="correct" dataname="wrong"';
const result3 = extractAttributeValue(attrs3, 'data+name');
// Before fix: returns "wrong" (matches 'dataname' instead of 'data+name')
// After fix: returns "correct" ✓

Result: Regex metacharacters cause incorrect regex patterns to be generated, leading to wrong matching behavior. For example, data+name becomes /data+name/ which matches "dataaaa...name" instead of literally "data+name".

Expected: All parameters should be escaped before interpolation into RegExp patterns to ensure they are treated as literal strings rather than regex metacharacters.

Fix applied: Added escapeRegexMetacharacters() utility function that escapes all regex metacharacters using .replace(/[.*+?^ before interpolating parameters into RegExp patterns in both files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmhall not sure how valid this is

// Handle API subdomain fallback
if (origin.startsWith('https://api.')) {
origin = `https://${origin.slice(12)}`;
const baseOrigin = `https://${origin.slice(12)}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When processing API subdomains, the origin field is always updated to the base domain even when all data was successfully retrieved from the API subdomain, causing the function to incorrectly report the data source.

View Details
📝 Patch Details
diff --git a/apps/scan/src/services/scraper/index.ts b/apps/scan/src/services/scraper/index.ts
index 78667617..9372b273 100644
--- a/apps/scan/src/services/scraper/index.ts
+++ b/apps/scan/src/services/scraper/index.ts
@@ -67,20 +67,26 @@ export const scrapeOriginData = async (inputOrigin: string) => {
         const baseData = await parseAllFromHtml(baseHtml, baseOrigin);
 
         // Fill in missing data from base domain
+        let dataFilledFromBase = false;
         if (needsOg && hasOgData(baseData.og)) {
           og = baseData.og;
+          dataFilledFromBase = true;
         }
         if (needsMetadata && hasMetadata(baseData.metadata)) {
           metadata = baseData.metadata;
+          dataFilledFromBase = true;
         }
         if (needsFavicon && baseData.favicon) {
           favicon = baseData.favicon;
+          dataFilledFromBase = true;
+        }
+
+        // Only update origin to base if we actually used data from it
+        if (dataFilledFromBase) {
+          origin = baseOrigin;
         }
       }
     }
-
-    // Update origin to base for URL resolution
-    origin = baseOrigin;
   }
 
   return {

Analysis

Origin field incorrectly updated to base domain in scrapeOriginData() even when API subdomain data is complete

What fails: The scrapeOriginData() function in apps/scan/src/services/scraper/index.ts always updates the returned origin field to the base domain when the input origin starts with https://api., regardless of whether data was actually fetched from the base domain. This causes the function to report incorrect source information.

How to reproduce:

  1. Call scrapeOriginData('https://api.example.com')
  2. The API subdomain successfully returns complete HTML with OG data, metadata, and favicon
  3. Verify the return value's origin field

Result: Returns origin: 'https://example.com' even though all data came from https://api.example.com

Expected: Should return origin: 'https://api.example.com' since no data was fetched from the base domain

Root cause: Line 83 in the original code unconditionally executes origin = baseOrigin; whenever the condition origin.startsWith('https://api.') is true, regardless of whether any data was actually filled from the base domain.

Impact: The developer.ts endpoint returns incorrect scrapedOrigin in the preview response, causing users to see wrong source information. This also affects any downstream code that relies on the origin field to determine which domain the metadata came from.

@shafu0x
Copy link
Contributor

shafu0x commented Jan 14, 2026

I would do a DOM parser instead of a regex in extractIconCandidates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants