Skip to content

Java library for build regular expressions from a set of URLs

License

Notifications You must be signed in to change notification settings

andreAmorimF/urlregex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

URLRegex

From a set of URLs, this library builds a regular expression matching with all URLs used as inputs. Nice to whitelist/blacklist set of pages to be crawled inside one (or many) domain(s).

Install

If you are using Maven, just add to the dependencies section of your pom.xml.

<dependency>
    <groupId>com.github.andreAmorimF</groupId>
    <artifactId>urlregex</artifactId>
    <version>1.0</version>
</dependency>

Some examples

Different protocols:

// Given inputs
List<String> urls = new ArrayList<>();
urls.add("http://www.domain.com/forums/");
urls.add("https://www.domain.com/forums/");

// Build pattern and output
String pattern = URLRegex.buildPattern(urls).toString();
System.out.print(pattern)
//^https?://www.domain.com/forums/$

Unequal number of URL segments:

// Given inputs
List<String> urls = new ArrayList<>();
urls.add("http://www.domain.com/forums");
urls.add("http://www.domain.com/forums/viewforum_31.htm");
urls.add("http://www.domain.com/forums/viewforum_25.htm");

// Build pattern and output
String pattern = URLRegex.buildPattern(urls).toString();
System.out.print(pattern)
//^http://www\\.domain\\.com/forums/?(viewforum_\\d+\\.htm)?$

Using query String:

// Given inputs
List<String> urls = new ArrayList<>();
urls.add("http://domain.com/forum/viewforum.php?id=50");
urls.add("http://www.domain.com/forum/viewforum.php?id=1&p=2");

// Build pattern and output
String pattern = URLRegex.buildPattern(urls).toString();
System.out.print(pattern)
//^http://(www\\.)?domain\\.com/forum/viewforum\\.php\\??([&;]?id=[^&;]+|[&;]?p=[^&;]+)+$

Author

Andre Fonseca [email protected]

License

The MIT License

About

Java library for build regular expressions from a set of URLs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages