Skip to content

Conversation

@holtvogt
Copy link

@holtvogt holtvogt commented Sep 18, 2025

This PR introduces two new audits to monitor and analyze content fragment health on AEM:

What's New

  • CDN Content Fragment 404 Audit (cdn-content-fragment-404): Monitors CDN logs hourly to identify content fragment requests that return 404 errors, using Athena queries to aggregate and export the data to S3 for further analysis and reporting.
  • Content Fragment Broken Links Audit (content-fragment-broken-links): Analyzes broken content fragment paths discovered in CDN logs on a daily basis, and intelligently suggests repair actions through a multi-step workflow that applies various strategies like republishing, locale fallbacks, and similar path matching.

Use Case

Health monitoring for Content Fragments on AEM Sites: Automatically detect broken content fragment references across AEM Sites by monitoring CDN traffic patterns, identifying 404 errors, and providing actionable repair suggestions to maintain content availability and user experience.

This commit introduces the AemAuthorClient class, which facilitates communication with the AEM Author API. It includes methods for checking content availability, fetching content, and retrieving child paths from a given parent path. The class also handles authentication and URL creation for API requests, enhancing the overall content path management capabilities.
This commit introduces the LevenshteinDistance class, which provides a static method to calculate the edit distance between two strings. The implementation includes error handling for null inputs and utilizes a dynamic programming approach to compute the distance efficiently.
This commit introduces the PathUtils class, which provides utility methods for managing content paths. The class includes methods to remove locale segments from paths and to retrieve the parent path of a given content path, enhancing the overall path management functionality.
This commit introduces the PathIndex and PathNode classes, which implement a structure for managing and indexing content paths. The PathIndex class provides methods for inserting, finding, deleting, and retrieving paths, while the PathNode class represents individual nodes in the path tree.
This commit introduces the AnalysisStrategy class, which implements a strategy for analyzing broken content paths using various rules. The class includes methods for cleaning paths, analyzing broken paths, and processing suggestions based on content status. It integrates existing rules such as PublishRule, LocaleFallbackRule, and SimilarPathRule to provide comprehensive analysis and recommendations for content management.
This commit introduces the AthenaCollector class, which extends BaseCollector and provides functionality to fetch broken content paths from an Athena database. It includes methods for ensuring the database and table exist, as well as querying for broken paths. Additionally, a BaseCollector class is created as a base for future collectors, and a CollectorFactory class is added to instantiate collectors based on the specified type, currently supporting Athena.
This commit introduces three new SQL files: create-database.sql for creating a database if it doesn't exist, create-table.sql for defining an external table with partitioning and storage properties, and daily-query.sql for selecting URLs based on specified date and tenant criteria.
This commit refactors the broken content path handling by introducing three new functions: fetchBrokenContentPaths, analyzeBrokenContentPaths, and provideSuggestions. It replaces the previous runner function with a step-based approach, utilizing a CollectorFactory for fetching broken paths and using the analysis strategy.
This update introduces pagination handling in the AemAuthorClient class, allowing for the fetching of content in multiple pages. A maximum page limit and a delay between requests have been implemented to manage rate limiting. Additionally, a utility function for creating URLs with pagination parameters has been added, along with error handling improvements during content fetching.
This update modifies the logic for determining content availability by checking if exactly one item exists in the response.
@holtvogt holtvogt requested a review from Copilot September 29, 2025 12:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/content-fragment-broken-links/utils/path-utils.js:1

  • The removeDoubleSlashes function incorrectly removes one slash from protocol schemes. The expected result should preserve the protocol format 'http://example.com/path' rather than converting to 'http:/example.com/path'.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

holtvogt and others added 25 commits September 29, 2025 14:51
Updated the `removeDoubleSlashes` method in `PathUtils` to correctly preserve protocol slashes when removing double slashes from URLs. Adjusted unit tests to reflect the new behavior, ensuring that protocols are maintained in the output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants