Skip to content

Commit

Permalink
- Add new methods : QueryAll and Query compatible with invalid XP…
Browse files Browse the repository at this point in the history
…ath expression error

- Add `QuerySelector` and `QuerySelectorAll` methods, supported reused your query object. #15
  • Loading branch information
zhengchun committed Oct 5, 2019
1 parent b8d3629 commit f78b514
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 28 deletions.
53 changes: 42 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,6 @@ Overview

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

Changelogs
===

2019-02-04
- [#7](https://github.com/antchfx/htmlquery/issues/7) Removed deprecated `FindEach()` and `FindEachWithBreak()` methods.

2018-12-28
- Avoid adding duplicate elements to list for `Find()` method. [#6](https://github.com/antchfx/htmlquery/issues/6)

Installation
====

Expand All @@ -27,6 +18,15 @@ Installation
Getting Started
====

#### Query, returns matched elements or error.

```go
nodes, err := htmlquery.QueryAll(doc, "//a")
if err != nil {
panic(`not a valid XPath expression.`)
}
```

#### Load HTML document from URL.

```go
Expand Down Expand Up @@ -72,7 +72,20 @@ v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)
```

Quick Tutorial
Changelogs
===

2019-10-05
- Add new methods that compatible with invalid XPath expression error: `QueryAll` and `Query`.
- Add `QuerySelector` and `QuerySelectorAll` methods, supported reused your query object.

2019-02-04
- [#7](https://github.com/antchfx/htmlquery/issues/7) Removed deprecated `FindEach()` and `FindEachWithBreak()` methods.

2018-12-28
- Avoid adding duplicate elements to list for `Find()` method. [#6](https://github.com/antchfx/htmlquery/issues/6)

Tutorial
===

```go
Expand All @@ -82,13 +95,31 @@ func main() {
panic(err)
}
// Find all news item.
for i, n := range htmlquery.Find(doc, "//ol/li") {
list, err := htmlquery.QueryAll(doc, "//ol/li")
if err != nil {
panic(err)
}
for i, n := range list {
a := htmlquery.FindOne(n, "//a")
fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"))
}
}
```

FAQ
====

#### `Find()` vs `QueryAll()`, which is better?

`Find` and `QueryAll` both do the same things, searches all of matched html nodes.
The `Find` will panics if you give an error XPath query, but `QueryAll` will return an error for you.

#### Can I save my query expression object for the next query?

Yes, you can. We offer the `QuerySelector` and `QuerySelectorAll` methods, It will accept your query expression object.

Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.

List of supported XPath query packages
===
|Name |Description |
Expand Down
68 changes: 51 additions & 17 deletions query.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,63 @@ func CreateXPathNavigator(top *html.Node) *NodeNavigator {
return &NodeNavigator{curr: top, root: top, attr: -1}
}

// Find searches the html.Node that matches by the specified XPath expr.
// Find is like QueryAll but Will panics if the expression `expr` cannot be parsed.
//
// See `QueryAll()` function.
func Find(top *html.Node, expr string) []*html.Node {
exp, err := xpath.Compile(expr)
if err != nil {
panic(err)
}
return QuerySelectorAll(top, exp)
}

// FindOne is like Query but will panics if the expression `expr` cannot be parsed.
// See `Query()` function.
func FindOne(top *html.Node, expr string) *html.Node {
exp, err := xpath.Compile(expr)
if err != nil {
panic(err)
}
return QuerySelector(top, exp)
}

// QueryAll searches the html.Node that matches by the specified XPath expr.
// Return an error if the expression `expr` cannot be parsed.
func QueryAll(top *html.Node, expr string) ([]*html.Node, error) {
exp, err := xpath.Compile(expr)
if err != nil {
return nil, err
}
nodes := QuerySelectorAll(top, exp)
return nodes, nil
}

// Query searches the html.Node that matches by the specified XPath expr,
// and return the first element of matched html.Node.
//
// Return an error if the expression `expr` cannot be parsed.
func Query(top *html.Node, expr string) (*html.Node, error) {
exp, err := xpath.Compile(expr)
if err != nil {
return nil, err
}
return QuerySelector(top, exp), nil
}

// QuerySelector returns the first matched html.Node by the specified XPath selector.
func QuerySelector(top *html.Node, selector *xpath.Expr) *html.Node {
t := selector.Select(CreateXPathNavigator(top))
if t.MoveNext() {
return getCurrentNode(t.Current().(*NodeNavigator))
}
return nil
}

// QuerySelectorAll searches all of the html.Node that matches the specified XPath selectors.
func QuerySelectorAll(top *html.Node, selector *xpath.Expr) []*html.Node {
var elems []*html.Node
t := exp.Select(CreateXPathNavigator(top))
t := selector.Select(CreateXPathNavigator(top))
for t.MoveNext() {
nav := t.Current().(*NodeNavigator)
n := getCurrentNode(nav)
Expand All @@ -42,21 +91,6 @@ func Find(top *html.Node, expr string) []*html.Node {
return elems
}

// FindOne searches the html.Node that matches by the specified XPath expr,
// and returns first element of matched html.Node.
func FindOne(top *html.Node, expr string) *html.Node {
var elem *html.Node
exp, err := xpath.Compile(expr)
if err != nil {
panic(err)
}
t := exp.Select(CreateXPathNavigator(top))
if t.MoveNext() {
elem = getCurrentNode(t.Current().(*NodeNavigator))
}
return elem
}

// LoadURL loads the HTML document from the specified URL.
func LoadURL(url string) (*html.Node, error) {
resp, err := http.Get(url)
Expand Down

0 comments on commit f78b514

Please sign in to comment.