better favicon handling #466

fmhall · 2026-01-14T16:22:13Z

No description provided.

vercel · 2026-01-14T16:22:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
x402scan	Error		Jan 14, 2026 10:03pm

apps/scan/src/trpc/routers/resources.ts

apps/scan/src/services/scraper/metadata.ts

apps/scan/src/services/scraper/favicon.ts

apps/scan/src/app/api/resources/sync/route.ts

apps/scan/src/lib/resources.ts

vercel · 2026-01-14T18:01:28Z

apps/scan/src/services/scraper/metadata.ts

+  const patterns = [
+    // name attribute with double-quoted content
+    new RegExp(
+      `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`,


The nameOrProperty parameter is directly interpolated into regex patterns without escaping regex metacharacters, creating a potential regex injection vulnerability and correctness issue if dynamic meta tag names are ever used.

View Details

📝 Patch Details

diff --git a/apps/scan/src/services/scraper/metadata.ts b/apps/scan/src/services/scraper/metadata.ts index 87f62a55..31b96ca4 100644 --- a/apps/scan/src/services/scraper/metadata.ts +++ b/apps/scan/src/services/scraper/metadata.ts @@ -4,6 +4,13 @@ interface Metadata { url?: string; } +/** + * Escapes regex metacharacters to prevent regex injection and ensure literal matching + */ +const escapeRegex = (str: string): string => { + return str.replace(/[.*+?^${}()|[\]\\]/g, '\\'); +}; + /** * Extracts the content from a meta tag * Uses separate patterns for double-quoted and single-quoted values to handle quotes in content @@ -12,47 +19,51 @@ const extractMetaContent = ( html: string, nameOrProperty: string ): string | null => { + // Escape regex metacharacters in the name/property to prevent regex injection + // and ensure exact matching of literal characters + const escapedNameOrProperty = escapeRegex(nameOrProperty); + // Match meta tags with name or property attribute // Use separate patterns for each quote type to properly handle quotes in values const patterns = [ // name attribute with double-quoted content new RegExp( - `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, + `<meta\\s+[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, 'i' ), // name attribute with single-quoted content new RegExp( - `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, + `<meta\\s+[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, 'i' ), // content before name with double quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["']`, 'i' ), // content before name with single quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${escapedNameOrProperty}["']`, 'i' ), // property attribute with double-quoted content new RegExp( - `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, + `<meta\\s+[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, 'i' ), // property attribute with single-quoted content new RegExp( - `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, + `<meta\\s+[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, 'i' ), // content before property with double quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["']`, 'i' ), // content before property with single quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${escapedNameOrProperty}["']`, 'i' ), ];

Analysis

Unescaped regex metacharacters in extractMetaContent() nameOrProperty parameter

What fails: extractMetaContent() in apps/scan/src/services/scraper/metadata.ts fails to match meta tags with names/properties containing regex metacharacters like [, ], *, +, ?, (, ), {, }, ^, $, |, ., or \. These characters are interpreted as regex operators instead of literals, preventing correct matching.

How to reproduce:

const escapeRegex = (str) => { return str.replace(/[.*+?^

Result: Returns null - unable to match the meta tag

Expected: Should return "first item" - the meta tag with brackets in the name should be matched literally

Fix applied: Added escapeRegex() helper function to escape regex metacharacters before interpolating into regex patterns, ensuring all 8 regex patterns in extractMetaContent() treat the parameter as literal text rather than regex operators.

vercel · 2026-01-14T18:01:28Z

apps/scan/src/services/scraper/favicon.ts

+  attrName: string
+): string | null => {
+  // Try double-quoted value first
+  let match = new RegExp(`${attrName}\\s*=\\s*"([^"]*)"`).exec(attrs);


The attrName parameter is directly interpolated into regex patterns without escaping regex metacharacters, creating a potential regex injection vulnerability if dynamic attribute names are ever used.

View Details

📝 Patch Details

diff --git a/apps/scan/src/services/scraper/favicon.ts b/apps/scan/src/services/scraper/favicon.ts index b8ce21c1..bbc2a3c1 100644 --- a/apps/scan/src/services/scraper/favicon.ts +++ b/apps/scan/src/services/scraper/favicon.ts @@ -65,6 +65,13 @@ const getIconPriority = ( return 6; }; +/** + * Escapes regex metacharacters in a string to prevent regex injection + */ +const escapeRegexMetacharacters = (str: string): string => { + return str.replace(/[.*+?^${}()|[\]\\]/g, '\\'); +}; + /** * Extracts an attribute value, handling both double and single quotes * This allows values to contain the opposite quote character (e.g., apostrophes in double-quoted values) @@ -73,12 +80,14 @@ const extractAttributeValue = ( attrs: string, attrName: string ): string | null => { + const escapedName = escapeRegexMetacharacters(attrName); + // Try double-quoted value first - let match = new RegExp(`${attrName}\\s*=\\s*"([^"]*)"`).exec(attrs); + let match = new RegExp(`${escapedName}\\s*=\\s*"([^"]*)"`).exec(attrs); if (match?.[1] !== undefined) return match[1]; // Try single-quoted value - match = new RegExp(`${attrName}\\s*=\\s*'([^']*)'`).exec(attrs); + match = new RegExp(`${escapedName}\\s*=\\s*'([^']*)'`).exec(attrs); if (match?.[1] !== undefined) return match[1]; return null; diff --git a/apps/scan/src/services/scraper/metadata.ts b/apps/scan/src/services/scraper/metadata.ts index 87f62a55..4695886d 100644 --- a/apps/scan/src/services/scraper/metadata.ts +++ b/apps/scan/src/services/scraper/metadata.ts @@ -4,6 +4,13 @@ interface Metadata { url?: string; } +/** + * Escapes regex metacharacters in a string to prevent regex injection + */ +const escapeRegexMetacharacters = (str: string): string => { + return str.replace(/[.*+?^${}()|[\]\\]/g, '\\'); +}; + /** * Extracts the content from a meta tag * Uses separate patterns for double-quoted and single-quoted values to handle quotes in content @@ -12,47 +19,49 @@ const extractMetaContent = ( html: string, nameOrProperty: string ): string | null => { + const escaped = escapeRegexMetacharacters(nameOrProperty); + // Match meta tags with name or property attribute // Use separate patterns for each quote type to properly handle quotes in values const patterns = [ // name attribute with double-quoted content new RegExp( - `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, + `<meta\\s+[^>]*name\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*"([^"]*)"`, 'i' ), // name attribute with single-quoted content new RegExp( - `<meta\\s+[^>]*name\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, + `<meta\\s+[^>]*name\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*'([^']*)'`, 'i' ), // content before name with double quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*name\\s*=\\s*["']${escaped}["']`, 'i' ), // content before name with single quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*name\\s*=\\s*["']${escaped}["']`, 'i' ), // property attribute with double-quoted content new RegExp( - `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*"([^"]*)"`, + `<meta\\s+[^>]*property\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*"([^"]*)"`, 'i' ), // property attribute with single-quoted content new RegExp( - `<meta\\s+[^>]*property\\s*=\\s*["']${nameOrProperty}["'][^>]*content\\s*=\\s*'([^']*)'`, + `<meta\\s+[^>]*property\\s*=\\s*["']${escaped}["'][^>]*content\\s*=\\s*'([^']*)'`, 'i' ), // content before property with double quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*"([^"]*)"[^>]*property\\s*=\\s*["']${escaped}["']`, 'i' ), // content before property with single quotes new RegExp( - `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${nameOrProperty}["']`, + `<meta\\s+[^>]*content\\s*=\\s*'([^']*)'[^>]*property\\s*=\\s*["']${escaped}["']`, 'i' ), ];

Analysis

Regex Injection Vulnerability in extractAttributeValue() and extractMetaContent()

What fails: extractAttributeValue() in favicon.ts and extractMetaContent() in metadata.ts interpolate user-controllable parameters directly into RegExp patterns without escaping regex metacharacters. This causes:

Regex metacharacters in attribute names to be misinterpreted as regex patterns rather than literal characters

Incorrect matching behavior when attribute names contain metacharacters like ., *, +, [, ], ^, $, {, }, (, ), |, \

How to reproduce:

// Attribute name with dot (. matches any character in regex) const attrs = 'data.name="value1" dataxname="value2"'; const result = extractAttributeValue(attrs, 'data.name'); // Should find "value1" // Before fix: returns "value1" ✓ (but only by accident - . matches any character) // Problem: If HTML had 'dataxname', it would also match (because . matches x) // Attribute name with asterisk (* means zero or more in regex) const attrs2 = 'data*name="test-value"'; const result2 = extractAttributeValue(attrs2, 'data*name'); // Before fix: returns null (incorrectly fails) // After fix: returns "test-value" ✓ // Attribute name with plus (+ means one or more in regex) const attrs3 = 'data+name="correct" dataname="wrong"'; const result3 = extractAttributeValue(attrs3, 'data+name'); // Before fix: returns "wrong" (matches 'dataname' instead of 'data+name') // After fix: returns "correct" ✓

Result: Regex metacharacters cause incorrect regex patterns to be generated, leading to wrong matching behavior. For example, data+name becomes /data+name/ which matches "dataaaa...name" instead of literally "data+name".

Expected: All parameters should be escaped before interpolation into RegExp patterns to ensure they are treated as literal strings rather than regex metacharacters.

Fix applied: Added escapeRegexMetacharacters() utility function that escapes all regex metacharacters using .replace(/[.*+?^ before interpolating parameters into RegExp patterns in both files.

@fmhall not sure how valid this is

vercel · 2026-01-14T18:01:29Z

apps/scan/src/services/scraper/index.ts

+  // Handle API subdomain fallback
  if (origin.startsWith('https://api.')) {
-    origin = `https://${origin.slice(12)}`;
+    const baseOrigin = `https://${origin.slice(12)}`;


When processing API subdomains, the origin field is always updated to the base domain even when all data was successfully retrieved from the API subdomain, causing the function to incorrectly report the data source.

View Details

📝 Patch Details

diff --git a/apps/scan/src/services/scraper/index.ts b/apps/scan/src/services/scraper/index.ts index 78667617..9372b273 100644 --- a/apps/scan/src/services/scraper/index.ts +++ b/apps/scan/src/services/scraper/index.ts @@ -67,20 +67,26 @@ export const scrapeOriginData = async (inputOrigin: string) => { const baseData = await parseAllFromHtml(baseHtml, baseOrigin); // Fill in missing data from base domain + let dataFilledFromBase = false; if (needsOg && hasOgData(baseData.og)) { og = baseData.og; + dataFilledFromBase = true; } if (needsMetadata && hasMetadata(baseData.metadata)) { metadata = baseData.metadata; + dataFilledFromBase = true; } if (needsFavicon && baseData.favicon) { favicon = baseData.favicon; + dataFilledFromBase = true; + } + + // Only update origin to base if we actually used data from it + if (dataFilledFromBase) { + origin = baseOrigin; } } } - - // Update origin to base for URL resolution - origin = baseOrigin; } return {

Analysis

Origin field incorrectly updated to base domain in scrapeOriginData() even when API subdomain data is complete

What fails: The scrapeOriginData() function in apps/scan/src/services/scraper/index.ts always updates the returned origin field to the base domain when the input origin starts with https://api., regardless of whether data was actually fetched from the base domain. This causes the function to report incorrect source information.

How to reproduce:

Call scrapeOriginData('https://api.example.com')

The API subdomain successfully returns complete HTML with OG data, metadata, and favicon

Verify the return value's origin field

Result: Returns origin: 'https://example.com' even though all data came from https://api.example.com

Expected: Should return origin: 'https://api.example.com' since no data was fetched from the base domain

Root cause: Line 83 in the original code unconditionally executes origin = baseOrigin; whenever the condition origin.startsWith('https://api.') is true, regardless of whether any data was actually filled from the base domain.

Impact: The developer.ts endpoint returns incorrect scrapedOrigin in the preview response, causing users to see wrong source information. This also affects any downstream code that relies on the origin field to determine which domain the metadata came from.

shafu0x · 2026-01-14T18:42:37Z

I would do a DOM parser instead of a regex in extractIconCandidates

better favicon handling

baf45bf

vercel bot deployed to Preview January 14, 2026 16:24 View deployment

update

a7f86d3

vercel bot deployed to Preview January 14, 2026 17:08 View deployment

favicon stuff

8ec1d8c

vercel bot deployed to Preview January 14, 2026 17:24 View deployment

vercel bot reviewed Jan 14, 2026

View reviewed changes

keep storing metadata under subdomains if required

fe2c338

vercel bot deployed to Preview January 14, 2026 17:53 View deployment

vercel bot reviewed Jan 14, 2026

View reviewed changes

merge

dd44188

vercel bot had a problem deploying to Preview January 14, 2026 22:03 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

better favicon handling #466

better favicon handling #466

Uh oh!

fmhall commented Jan 14, 2026

Uh oh!

vercel bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot Jan 14, 2026

Uh oh!

vercel bot Jan 14, 2026

Uh oh!

shafu0x Jan 14, 2026

Uh oh!

vercel bot Jan 14, 2026

Uh oh!

shafu0x commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

better favicon handling #466

Are you sure you want to change the base?

better favicon handling #466

Uh oh!

Conversation

fmhall commented Jan 14, 2026

Uh oh!

vercel bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot Jan 14, 2026

Choose a reason for hiding this comment

Analysis

Unescaped regex metacharacters in extractMetaContent() nameOrProperty parameter

Uh oh!

vercel bot Jan 14, 2026

Choose a reason for hiding this comment

Analysis

Regex Injection Vulnerability in extractAttributeValue() and extractMetaContent()

Uh oh!

shafu0x Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

vercel bot Jan 14, 2026

Choose a reason for hiding this comment

Analysis

Origin field incorrectly updated to base domain in scrapeOriginData() even when API subdomain data is complete

Uh oh!

shafu0x commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Jan 14, 2026 •

edited

Loading