-
Notifications
You must be signed in to change notification settings - Fork 13
feat: Add audit for broken content fragment requests #1282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
holtvogt
wants to merge
104
commits into
main
Choose a base branch
from
feature/broken-content-path-audit
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces the AemAuthorClient class, which facilitates communication with the AEM Author API. It includes methods for checking content availability, fetching content, and retrieving child paths from a given parent path. The class also handles authentication and URL creation for API requests, enhancing the overall content path management capabilities.
This commit introduces the LevenshteinDistance class, which provides a static method to calculate the edit distance between two strings. The implementation includes error handling for null inputs and utilizes a dynamic programming approach to compute the distance efficiently.
This commit introduces the PathUtils class, which provides utility methods for managing content paths. The class includes methods to remove locale segments from paths and to retrieve the parent path of a given content path, enhancing the overall path management functionality.
This commit introduces the PathIndex and PathNode classes, which implement a structure for managing and indexing content paths. The PathIndex class provides methods for inserting, finding, deleting, and retrieving paths, while the PathNode class represents individual nodes in the path tree.
This commit introduces the AnalysisStrategy class, which implements a strategy for analyzing broken content paths using various rules. The class includes methods for cleaning paths, analyzing broken paths, and processing suggestions based on content status. It integrates existing rules such as PublishRule, LocaleFallbackRule, and SimilarPathRule to provide comprehensive analysis and recommendations for content management.
This commit introduces the AthenaCollector class, which extends BaseCollector and provides functionality to fetch broken content paths from an Athena database. It includes methods for ensuring the database and table exist, as well as querying for broken paths. Additionally, a BaseCollector class is created as a base for future collectors, and a CollectorFactory class is added to instantiate collectors based on the specified type, currently supporting Athena.
This commit introduces three new SQL files: create-database.sql for creating a database if it doesn't exist, create-table.sql for defining an external table with partitioning and storage properties, and daily-query.sql for selecting URLs based on specified date and tenant criteria.
This commit refactors the broken content path handling by introducing three new functions: fetchBrokenContentPaths, analyzeBrokenContentPaths, and provideSuggestions. It replaces the previous runner function with a step-based approach, utilizing a CollectorFactory for fetching broken paths and using the analysis strategy.
This update introduces pagination handling in the AemAuthorClient class, allowing for the fetching of content in multiple pages. A maximum page limit and a delay between requests have been implemented to manage rate limiting. Additionally, a utility function for creating URLs with pagination parameters has been added, along with error handling improvements during content fetching.
This update modifies the logic for determining content availability by checking if exactly one item exists in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/content-fragment-broken-links/utils/path-utils.js:1
- The removeDoubleSlashes function incorrectly removes one slash from protocol schemes. The expected result should preserve the protocol format 'http://example.com/path' rather than converting to 'http:/example.com/path'.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Updated the `removeDoubleSlashes` method in `PathUtils` to correctly preserve protocol slashes when removing double slashes from URLs. Adjusted unit tests to reflect the new behavior, ensuring that protocols are maintained in the output.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces two new audits to monitor and analyze content fragment health on AEM:
What's New
cdn-content-fragment-404): Monitors CDN logs hourly to identify content fragment requests that return 404 errors, using Athena queries to aggregate and export the data to S3 for further analysis and reporting.content-fragment-broken-links): Analyzes broken content fragment paths discovered in CDN logs on a daily basis, and intelligently suggests repair actions through a multi-step workflow that applies various strategies like republishing, locale fallbacks, and similar path matching.Use Case
Health monitoring for Content Fragments on AEM Sites: Automatically detect broken content fragment references across AEM Sites by monitoring CDN traffic patterns, identifying 404 errors, and providing actionable repair suggestions to maintain content availability and user experience.