-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html-snippet doesn't work with Jsoup parser #90
Comments
Same issue, here's a simple example that breaks: (enlive/set-ns-parser! net.cgrand.jsoup/parser)
(enlive/html-resource (java.io.StringReader. "<h1>Hi, cgrand!</h1>") (enlive/ns-options)) The above returns this:
|
This is seriously ruining my day today. |
Here's a monkey patch workaround I use. Basically had to redefine a bunch of core functions and then modify the (ns my.namespace
(:import [org.jsoup Jsoup]
[org.jsoup.nodes Attribute Attributes Comment DataNode Document
DocumentType Element Node TextNode XmlDeclaration]
[org.jsoup.parser Parser Tag]))
(def ^:private ->key (comp keyword #(.. % toString toLowerCase)))
(defprotocol IEnlive
(->nodes [d] "Convert object into Enlive node(s)."))
(extend-protocol IEnlive
Attribute
(->nodes [a] [(->key (.getKey a)) (.getValue a)])
Attributes
(->nodes [as] (not-empty (into {} (map ->nodes as))))
Comment
(->nodes [c] {:type :comment :data (.getData c)})
DataNode
(->nodes [dn] (str dn))
Document
(->nodes [d] (not-empty (map ->nodes (.childNodes d))))
DocumentType
(->nodes [dtd] {:type :dtd :data ((juxt :name :publicid :systemid) (->nodes (.attributes dtd)))})
Element
(->nodes [e] {:tag (->key (.tagName e))
:attrs (->nodes (.attributes e))
:content (not-empty (map ->nodes (.childNodes e)))})
TextNode
(->nodes [tn] (.getWholeText tn))
nil
(->nodes [_] nil))
; redefined parser fn to support jsoup
(defn parser
"Parse a HTML document stream into Enlive nodes using JSoup."
[stream]
(with-open [^java.io.Closeable stream stream]
(->nodes (Jsoup/parse stream "ISO-8859-1" ""))))
; then this will work
(net.cgrand.enlive-html/html-resource (-> "<h1>Hi, cgrand!</h1>" (.getBytes "ISO-8859-1")
java.io.ByteArrayInputStream.) {:parser parser})
|
Added to wiki, many thanks @dhruvbhatia ! |
@JustinIAC, net.cgrand.jsoup should be fixed for handling readers before making JSoup the default |
net.cgrand.enlive-html/html-snippet pass java.io.StringReader instance to html-resource, but Jsoup/parse doesn't come with a corresponding interface.
The text was updated successfully, but these errors were encountered: