Community chat: #organice on IRC Libera.Chat, or #organice:matrix.org on Matrix
org-parser
is a parser for the Org mode markup language for Emacs.
It can be used from JavaScript, Java, Clojure and ClojureScript!
Org mode in Emacs is implemented in org-element.el (API documentation). The spec for the Org syntax is written in prose.
This is already great work, yet it has some drawbacks:
- The spec is not machine readable. Hence, there can be drift between
documentation and implementation. In fact, during the development
of organice, our web-based Org implementation with great mobile
phone support, and
org-parser
we have encountered drift. - org-element.el is naturally written in Emacs lisp and makes strong use of Emacs as a text-processor. Hence, its code can only be used within Emacs.
While writing the official spec already is an amazing effort in the standardization of the Org format, the power of Org is so enticing that many want to use it outside of Emacs, as well. Since org-element.el only runs in Emacs, this caused a myriad of implementations for other platforms (JavaScript, Rust, Go, Java, etc) to have been created. Most implementations are only partial, and more importantly each of them creates another island. Since they are just as programming language dependent as org-element.el, it is impossible to share logic between them.
org-parser
aims at alleviating both these issues. It documents the
syntax in a standard and machine readable notation (EBNF). And the
reference implementation is done in a way that it runs on the
established virtual machines of Java and JavaScript. Hence,
org-parser
can be used from all programming languages running on
those virtual machines. org-parser
provides a higher-level data
structure that is easy to consume for an application working with Org
mode data. Even if your application is not running on the Java or
JavaScript virtual machines, you can embed org-parser
as a
command-line application. Lastly, org-parser
brings a strong test
suite to document the reference implementation in yet another
unambiguous way.
It is our aim that org-parser
can be the foundation on which many
Org mode applications in many different languages can be built. The
applications using org-parser
can then focus on implementing user
facing features and don’t have to worry about the implementation of
the Org syntax itself.
The code base of org-parser
is split into four namespaces:
- org-parser.core (top level api, i.e.
read-str
,write-str
) - org-parser.parse (aka. deserializer, reader)
- org-parser.parse.transform (transforms the result of the parser into a more desirable structure)
- org-parser.render (aka. serializer, writer)
Thus org-parser
has become a misnomer in the sense, that it now
strives to be clojure/data.org
(after the pattern of existing Clojure
libraries like data.json
, data.xml
, data.csv
, etc) providing
reader as well as writer capabilities for the serialization format
org
.
This project is work-in-progress. It is not ready for production yet because the structure of the AST (parse tree) can still change.
The biggest milestones are:
- [X] Finish EBNF parser to support most Org mode syntax
- [X] Headlines
- [X] Org mode
#+*
stuff - [X] Timestamps
- [X] Links
- [X] Text links
- [X] Footnotes
- [X] Styled text
- [X] Drawers and
#+BEGIN_xxx
blocks - [ ] Nested markup (see #12)
- [X] Setup basic transformation from the parse tree to a higher-level structure.
- [-] Transformations to higher-level structure: catch up with features that are already supported by the EBNF parser.
- [-] Render parsed org file with
write-str
It can already be useful for you: E.g. if your script needs to parse parts of Org mode features, our EBNF parser probably already supports that. Do not underestimate e.g. timestamps. Use our well-tested parser to disassemble it in its parts, instead of trying to write a poor and ugly regex that is only capable of a subset of Org mode’s timestamps ;)
Don’t hesitate to contribute!
org-parser
uses instaparse which aims to be the simplest way to
build parsers in Clojure. Apart from living up to this claim (and
beyond the scope of just the one programming language), using
instaparse is great for another reason: Instaparse works both on CLJ
and CLJS. Therefore org-parser
can be used from both ecosystems
which, of course, include JavaScript and Java. Hence, it is possible
to use it in various situations.
Please install Clojure and Leiningen.
There’s no additional installation required. Leiningen will pull dependencies if required.
Running the tests:
# Clojure
lein test
# CLJS (starts a watcher)
lein doo node
If you’re not familiar with Lisp or Clojure, here’s a short video on how the tooling for Lisp (and hence Clojure) is great and enables fast developer feedback and high quality applications. Initially, the video was created to answer a specific issue on this repository. However, the question is a valid general question that is asked quite often by people who haven’t used a Lisp before.
You can watch it here: https://youtu.be/o2MLHFGUkoQ
Note: The version number should be replaced with the current version of org-parser. See the clojars badge at the top of this README.
CLI/deps.edn dependency information:
org-parser/org-parser {:mvn/version "0.1.4"}
Leiningen dependency information:
[org-parser "0.1.4"]
At the moment, you can run org-parser
from Clojure, ClojureScript,
or Java. Other targets which are hosted on the JVM or on JavaScript
are possible.
(ns hello-world.core
(:require [org-parser.parser :refer [parse]]
[org-parser.core :refer [read-str write-str]]))
(prn (parse "* Headline"))
(prn (read-str "* Headline"))
(println (write-str (read-str "* Headline")))
[:S [:headline [:stars “*”] [:text [:text-normal “Headline”]]]] |
{:headlines [{:headline {:level 1, :title :text-normal “Headline”, :planning [], :tags []}}]} |
”* Headline\n” |
Run lein run file.org
, for example:
lein run test/org_parser/fixtures/schedule_with_repeater.org
{:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}
First, compile org-parser
with:
lein uberjar
Then run java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar file.org
, for example:
java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar test/org_parser/fixtures/schedule_with_repeater.org
{:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}
Note: The *
character must be replaced with the current version number of org-parser.
See the clojars badge at the top of this README.