Readable

Yet another readability library for Java. It uses some heuristics to extract readable content from web pages (dropping menus, adds, etc.).

It works great for news and blog sites.

Check out it working on: http://brreadable.herokuapp.com/

Installation

Maven

Maven artifacts can be fetched using this tag on your pom's dependencies section:

<dependency>
    <groupId>br.com</groupId>
    <artifactId>readable</artifactId>
    <version>1.0</version>
    <type>jar</type>
</dependency>

The following repository needs to be specified:

<repositories>
    <repository>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
        <id>bintray-andreamorimf</id>
        <name>bintray-andreamorimf</name>
        <url>http://dl.bintray.com/andreamorimf/readable</url>
    </repository>
    ...
</repositories>

How to use it

In order to use the library, you can call (using a default org.w3c.dom.Document):

ReadableContentExtractor extractor = new ReadableContentExtractor(document);
Element main = extractor.extract();

The result element there is a node that indicates the main content, plus a title, description and main image. You can also use:

// Get the title string
String title = extractor.getTitle();

// Get the description string
String description = extractor.getDescription();

// Get the main image link (on OG meta tag or article image)
String imageLink = extractor.getMainImage();

// Get several image links on an article
Set<String> imageLinks = extractor.getMainImages(3);

// Get only element on main content, without title, description or image
Element main = extractor.getMainContent();

Author

Andre Fonseca [email protected]

License

The MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Readable

Installation

Maven

How to use it

Author

License

About

Releases

Packages

Languages

License

andreAmorimF/readable

Folders and files

Latest commit

History

Repository files navigation

Readable

Installation

Maven

How to use it

Author

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages