Cleans/extracts elements from content bodies. For use with content imports.
Note: This repository requires Docker Engine 18.06.0 or greater as Compose file format 3.7 is used. All commands are assumed to be ran from the project root.
- Clone the repository
- Install the project dependencies via
scripts/yarn.sh - Start the service via
docker-compose up app - The service will now be running on
http://0.0.0.0:4986
To run a cleanup/extraction on an HTML string, submit a POST request to the desired endpoint (see below). All requests MUST contain the Content-Type: text/html header and provide a raw HTML body (not a JSON body or a JSON encoded HTML body). This can be done via cURL, fetch, or any other tool (e.g. Insomnia). The server will return a JSON response with the extracted values and the HTML (the exact format varies depending on the rule).
Note: It is assumed that posted HTML will be encoded in UTF-8 (and will respond in-kind). As such, ensure character encoding conversions have been completed before using this service.
/pennwell/defaultSee the rule's documentation for more information.
Because this repository uses Docker, you should not execute Yarn directly. Instead, execute Yarn commands using the provided script. For example, to add a dependency you would run scripts/yarn.sh add [package-name] from the project root. This works for all Yarn commands, e.g, scripts/yarn.sh [command] [args]
You can execute an interactive terminal (inside the Docker container) via scripts/terminal.sh. You can also lint the entire project using scripts/lint.sh