We currently extract links from all pages that are on the same domain as the original URL that is passed to crawl.
This might be too narrow (for instance, a site may be spread over several subdomains) or too broad (for instance, somebody might be only interested in pages that are children of a particular URL).
We should find a way to allow the user to configure which pages to extract links from