Skip to content

Create versioned releases of pandoc-wasm (released as wasm-pandoc on npmjs.com) #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 73 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
31d06a4
update pandoc repo
johanneswilm Feb 10, 2025
75fbbed
Update links
johanneswilm Feb 10, 2025
c92ad77
Update README.md
johanneswilm Feb 10, 2025
3259e24
use patch for build
johanneswilm Feb 11, 2025
41623ec
Merge branch 'master' of github.com:johanneswilm/pandoc-wasm
johanneswilm Feb 11, 2025
9d68a29
clarify readme wording
johanneswilm Feb 11, 2025
9f59d23
Upload releases to ga run when not tagged
johanneswilm Feb 11, 2025
15afec9
actions/upload-artifact@v4
johanneswilm Feb 11, 2025
4d9de2e
fix filename
johanneswilm Feb 11, 2025
25ad35c
change reference to VERSION
johanneswilm Feb 11, 2025
e1317d4
use env.VERSION
johanneswilm Feb 11, 2025
94f0e68
use tagged pandoc version
johanneswilm Feb 11, 2025
a92165e
reorganize github actions
johanneswilm Feb 11, 2025
4d78291
switch github events release type
johanneswilm Feb 11, 2025
d99cc8b
github action events change
johanneswilm Feb 11, 2025
5245d39
0.2+3.6.3
johanneswilm Feb 11, 2025
dd1429e
Add package.json
johanneswilm Feb 11, 2025
59f7aeb
fix package.json syntax
johanneswilm Feb 11, 2025
c59abb7
0.3+3.6.3
johanneswilm Feb 11, 2025
5632d85
reorganize zipping
johanneswilm Feb 11, 2025
d633022
Merge branch 'main' of github.com:johanneswilm/pandoc-wasm
johanneswilm Feb 11, 2025
23ce56a
Use semver (requirement by npm)
johanneswilm Feb 11, 2025
ac6efee
set up for distribution via npm
johanneswilm Feb 11, 2025
6cf1ed2
Add maintainership. make exports work, add example to readme.
johanneswilm Feb 13, 2025
055a9dc
lint
johanneswilm Feb 13, 2025
96ec06d
lint
johanneswilm Feb 13, 2025
e1f914d
release to releases branch (while waiting for npm access)
johanneswilm Feb 13, 2025
e4062a4
0.5.1
johanneswilm Feb 13, 2025
ecdf7cb
Add mediafile output
johanneswilm Feb 13, 2025
ecf8fa9
0.6.0
johanneswilm Feb 13, 2025
58f19c3
correct dependency
johanneswilm Feb 13, 2025
45fa144
0.6.1
johanneswilm Feb 13, 2025
b743e09
update URLs
johanneswilm Feb 27, 2025
fccda16
disable distribution via releases branch
johanneswilm Feb 27, 2025
0757b87
0.6.2
johanneswilm Feb 27, 2025
d729384
add readme and license files to npm
johanneswilm Feb 27, 2025
dc9eb6b
refactor github actions script
johanneswilm Feb 27, 2025
54ef47e
change dir for patching
johanneswilm Feb 27, 2025
4e6606c
add pandoc build caching
johanneswilm Feb 27, 2025
fcae9ce
fix cache key reference
johanneswilm Feb 27, 2025
b633593
add pre-commit
johanneswilm Feb 27, 2025
bf77d0b
add extra files
johanneswilm Feb 27, 2025
a42e4a3
difference reference to reference variables
johanneswilm Feb 27, 2025
ea06f47
variable refernece change
johanneswilm Feb 27, 2025
73cee5c
remove unused shellcheck checker
johanneswilm Feb 27, 2025
9b75df5
disambigious variable names
johanneswilm Feb 27, 2025
238ec75
fix cache extraction
johanneswilm Feb 27, 2025
f3e28b7
fix url
johanneswilm Feb 27, 2025
5e0b198
pre-commit==4.1.0
johanneswilm Feb 27, 2025
d2a5624
switch to pre-commit action
johanneswilm Feb 27, 2025
6d3a56e
style changes
johanneswilm Feb 27, 2025
82ff51a
lint: add quotes
johanneswilm Feb 27, 2025
9fe1336
always deploy pages
johanneswilm Feb 27, 2025
0331c8c
0.6.3
johanneswilm Feb 27, 2025
eb0ce81
output github ref name
johanneswilm Feb 27, 2025
bd5849e
0.6.4
johanneswilm Feb 27, 2025
d3fe83a
github pages not for tag
johanneswilm Feb 27, 2025
de5a283
different ref reference
johanneswilm Feb 27, 2025
5567650
reset tagging, giving up
johanneswilm Feb 27, 2025
b4f0480
publish on release - does not seem to work with tag
johanneswilm Feb 27, 2025
a866526
0.6.5
johanneswilm Feb 27, 2025
29f2328
Revert "publish on release - does not seem to work with tag"
johanneswilm Feb 27, 2025
7514486
test publishign with tags again
johanneswilm Feb 27, 2025
19aa032
0.6.6
johanneswilm Feb 27, 2025
4a41420
pandoc 3.6.4
johanneswilm Mar 24, 2025
f3aa330
0.7.0
johanneswilm Mar 24, 2025
59297a8
update patch
johanneswilm Mar 24, 2025
2935bcc
pin ghc-wasm-meta
johanneswilm Mar 25, 2025
65d5776
0.7.1
johanneswilm Mar 25, 2025
537b73e
add repo links
johanneswilm May 26, 2025
0c1b15b
pandoc 3.7.0.1
johanneswilm May 26, 2025
ba14fa3
depencdency update
johanneswilm May 26, 2025
af13710
0.8.0
johanneswilm May 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 41 additions & 11 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
name: build

on:
merge_group:
pull_request:
push:
branches:
- master
workflow_dispatch:
on: [push, pull_request]

permissions:
contents: write

jobs:
build:
Expand Down Expand Up @@ -35,16 +32,29 @@ jobs:
~/.ghc-wasm/add_to_github_path.sh
popd

- name: checkout
- name: Checkout Pandoc-wasm
uses: actions/checkout@v4

- name: checkout
- name: Checkout Pandoc
uses: actions/checkout@v4
with:
repository: haskell-wasm/pandoc
ref: wasm
repository: jgm/pandoc
ref: main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to put a version number on these builds, shouldn't they also be pinned to a specific commit (probably a release tag) for the underlying Pandoc version too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque Yes, you are right. I am trying to figure out how to best do this. It should probably be easy to get it to build with the most recent pandoc version and we probably need another digit to the number to show the version of the pandoc-wasm package. So 3.6.3.x instead of just 3.6.3. If you have a proposal of how to do the versioning in the most standard complying and simple way - I'm all for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a segment will make it hard to parse because Pandoc follows PVP and has a variable number of segments. I think you're going to need a different segment operator that plays nice with distro versioning (I think + is the most robust option). The question is probably does it make more sense to version this project first, then append the relevant Pandoc version, or the other way around? e.g. pandoc-wasm-3.6.3+0.1 or pandoc-wasm-0.1+3.6.3. I think the latter probably makes more sense but it depends on the expected release channels and use workflows I guess. I don't really have a handle on that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque OK, as someone working mainly with JS/TS in browsers, I would expect it to become available in npm. But I'm open to others needing it other places - maybe? I like both of your versioning proposals, and unless there is reason not to do so, I'd go for the second one then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque This seems to now be working with tagged versions, etc. . Only thing remaining is put it it on npm. As I assume that @TerrorJack or @tweaf or @alerque will want to have control over the npm repository after merging this PR (or writing something similar), I will not add that part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque It looks like npm does not allow this versioning scheme. It only allows 3 digit semver. So I split the two numbering apart. It's still really easy to update the pandoc version though.

path: pandoc

- name: Patch Pandoc sources
run: |
cd pandoc
patch -p1 < ../patch/pandoc.patch
cd ..

- name: Extract version from pandoc.cabal
id: extract-version
run: |
VERSION=$(grep '^version:' pandoc/pandoc.cabal | awk '{print $2}')
echo "Extracted version: $VERSION"
echo "VERSION=$VERSION" >> $GITHUB_ENV

- name: gen-plan-json
run: |
pushd pandoc
Expand Down Expand Up @@ -77,6 +87,26 @@ jobs:
wasmtime run --dir $PWD::/ -- dist/pandoc.wasm pandoc/README.md -o pandoc/README.rst
head --lines=20 pandoc/README.rst

- name: Zip dist folder
run: |
zip -r pandoc-wasm-${{ env.VERSION }}.zip dist/

- name: Upload zipped file as artifact
uses: actions/upload-artifact@v4
with:
name: pandoc-wasm-${{ env.VERSION }}.zip
path: pandoc-wasm-${{ env.VERSION }}.zip

- name: Upload to release
if: github.event_name == 'push' && contains(github.ref, 'refs/tags/')
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: pandoc-wasm-${{ env.VERSION }}.zip
asset_name: pandoc-wasm-${{ env.VERSION }}.zip
tag: ${{ github.ref }}
overwrite: true

- name: upload-pages-artifact
uses: actions/upload-pages-artifact@v3
with:
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# `pandoc-wasm`

*Fork for private experimentation.*

[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#haskell-wasm:matrix.terrorjack.com)

The latest version of `pandoc` CLI compiled as a standalone
`wasm32-wasi` module that can be run by engines like `wasmtime` as
well as browsers.

## [Live demo](https://tweag.github.io/pandoc-wasm)
## [Live demo](https://johanneswilm.github.io/pandoc-wasm)

Stdin on the left, stdout on the right, command line arguments at the
bottom. No convert button, output is produced dynamically as input
changes.

You're also more than welcome to fetch the
[`pandoc.wasm`](https://tweag.github.io/pandoc-wasm/pandoc.wasm)
[`pandoc.wasm`](https://johanneswilm.github.io/pandoc-wasm/pandoc.wasm)
module and make your own customized app. `pandoc.wasm` is fully
`wasm32-wasi` compliant and doesn't make use of any JSFFI feature in
the ghc wasm backend.
Expand All @@ -27,7 +29,7 @@ need at least 9.10 since it's the earliest major version with (my
non-official) backports for ghc wasm backend's Template Haskell & ghci
support.

It's built using my
It's build-method is based on this
[fork](https://github.com/haskell-wasm/pandoc/tree/wasm) which is
based on latest `pandoc` release and patches dependencies, cabal
config as well as some module code to make things compilable to wasm:
Expand Down
229 changes: 229 additions & 0 deletions patch/pandoc.patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you used haskell-wasm/pandoc#1 to create this, makes sense 👍

Now that I see the minimized diff again, I realize that the only remaining thing that is actually patched in the Haskell code here is the addition of the wasm_main export, such that the same .wasm file can be used both as a command module (e.g. locally via wasmtime) and as a reactor (for the web app). While this is quite neat, having separate .wasm files for these would have the following advantages:

  • Apart from the removal of -threaded (which could be upstreamed behind a arch(wasm32) conditional), no pandoc patching would be necessary, one could just build stock pandoc-cli (of course with an appropriate cabal.project for various dependencies).
  • In the separate build that exposes an FFI, one could actually use the full Wasm JSFFI which is more convenient.

@johanneswilm In your use case, do you use the .wasm file as a command or a reactor module?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amesgen That sounds promising. Yes, I took your patch in order to make the diff smaller.

My use case is this: I have created an open source word processor similar to Google Docs/Microsoft Word 365 Online, etc. for a specific niche market. I have written a number of export filters myself to common formats (like DOCX, ODT, HTML, EPUB, JATS, etc.). These are written in JS and all run in the end users browser. I have even written an import filter for ODT files that works the same way.

Now I have a number of users wanting to import and export from other, more exotic formats. So I've created import/export filters in JS to the pandoc internal json format. And I then let the client send the pandoc json to a server where pandoc is run in server mode, converting the json to one of the other formats and sending it back to the client.

This is a bit problematic as these conversions take up processing power on the server and it's quite complex to deploy pandoc to a number of different architectures for which there are no pre-compiled binaries. There could even be security issues about sending various files back and forth.

So my idea is to instead use this, if the user clicks on export to or import from an "exotic" format for the first time, the browser will download the pandoc.wasm file (to cache for future use) and then to do the conversion in the users own browser instead - taking the users own processing power and not that of the server.

I assume you are referring to the terms "reactor" and "command" as they are defined here [1]. Given that I only want to execute the conversion based on one input and then to close down again (only caching the binary so it doesn't have to be downloaded again), I assume this corresponds to pandoc-cli more than pandoc-server.

I haven't yet tried whether it is actually possible. Based on the web demo it seems like it should though.

I don't know the who-is-who of the haskell-wasm world either. So I don't know which one of all of you can make any decisions here and who would be a good candidate to maintain an npm package of pandoc-wasm. I do maintain several open source packages, but those are written in languages that I use daily (like JS/TS or Python). So if one of you would want to step forward and do this, I'd be in favor.

[1] WebAssembly/WASI#13 (comment)

Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
diff --git a/cabal.project b/cabal.project
index 4ca6da52e630..b3a4ffcbb87b 100644
--- a/cabal.project
+++ b/cabal.project
@@ -2,10 +2,147 @@ packages: .
pandoc-lua-engine
pandoc-server
pandoc-cli
-tests: True
-flags: +embed_data_files
+tests: False
+
constraints: skylighting-format-blaze-html >= 0.1.1.3,
skylighting-format-context >= 0.1.0.2,
-- for now (commercialhaskell/stackage#7545):
data-default-class <= 0.2, data-default <= 0.8

+allow-newer: all:zlib
+
+package aeson
+ flags: -ordered-keymap
+
+package crypton
+ ghc-options: -optc-DARGON2_NO_THREADS
+
+package digest
+ flags: -pkg-config
+
+package pandoc
+ flags: +embed_data_files
+
+package pandoc-cli
+ flags: -lua -server
+
+allow-newer:
+ all:Cabal,
+ all:Cabal-syntax,
+ all:array,
+ all:base,
+ all:binary,
+ all:bytestring,
+ all:containers,
+ all:deepseq,
+ all:directory,
+ all:exceptions,
+ all:filepath,
+ all:ghc,
+ all:ghc-bignum,
+ all:ghc-boot,
+ all:ghc-boot-th,
+ all:ghc-compact,
+ all:ghc-experimental,
+ all:ghc-heap,
+ all:ghc-internal,
+ all:ghc-platform,
+ all:ghc-prim,
+ all:ghc-toolchain,
+ all:ghci,
+ all:haskeline,
+ all:hpc,
+ all:integer-gmp,
+ all:mtl,
+ all:os-string,
+ all:parsec,
+ all:pretty,
+ all:process,
+ all:rts,
+ all:semaphore-compat,
+ all:stm,
+ all:system-cxx-std-lib,
+ all:template-haskell,
+ all:text,
+ all:time,
+ all:transformers,
+ all:unix,
+ all:xhtml
+
+constraints:
+ Cabal installed,
+ Cabal-syntax installed,
+ array installed,
+ base installed,
+ binary installed,
+ bytestring installed,
+ containers installed,
+ deepseq installed,
+ directory installed,
+ exceptions installed,
+ filepath installed,
+ ghc installed,
+ ghc-bignum installed,
+ ghc-boot installed,
+ ghc-boot-th installed,
+ ghc-compact installed,
+ ghc-experimental installed,
+ ghc-heap installed,
+ ghc-internal installed,
+ ghc-platform installed,
+ ghc-prim installed,
+ ghc-toolchain installed,
+ ghci installed,
+ haskeline installed,
+ hpc installed,
+ integer-gmp installed,
+ mtl installed,
+ os-string installed,
+ parsec installed,
+ pretty installed,
+ process installed,
+ rts installed,
+ semaphore-compat installed,
+ stm installed,
+ system-cxx-std-lib installed,
+ template-haskell installed,
+ text installed,
+ time installed,
+ transformers installed,
+ unix installed,
+ xhtml installed
+
+-- https://github.com/haskell/network/pull/598
+source-repository-package
+ type: git
+ location: https://github.com/haskell-wasm/network.git
+ tag: ab92e48e9fdf3abe214f85fdbe5301c1280e14e9
+
+source-repository-package
+ type: git
+ location: https://github.com/haskell-wasm/foundation.git
+ tag: 8e6dd48527fb429c1922083a5030ef88e3d58dd3
+ subdir: basement
+
+source-repository-package
+ type: git
+ location: https://github.com/haskell-wasm/hs-memory.git
+ tag: a198a76c584dc2cfdcde6b431968de92a5fed65e
+
+source-repository-package
+ type: git
+ location: https://github.com/haskell-wasm/xml.git
+ tag: bc793dc9bc29c92245d3482a54d326abd3ae1403
+ subdir: xml-conduit
+
+-- https://github.com/haskellari/splitmix/pull/73
+source-repository-package
+ type: git
+ location: https://github.com/amesgen/splitmix
+ tag: 5f5b766d97dc735ac228215d240a3bb90bc2ff75
+
+source-repository-package
+ type: git
+ location: https://github.com/amesgen/cborg
+ tag: c3b5c696f62d04c0d87f55250bfc0016ab94d800
+ subdir: cborg
diff --git a/pandoc-cli/pandoc-cli.cabal b/pandoc-cli/pandoc-cli.cabal
index 5b904b9906bd..66d92a1875f3 100644
--- a/pandoc-cli/pandoc-cli.cabal
+++ b/pandoc-cli/pandoc-cli.cabal
@@ -61,7 +61,7 @@ common common-options

common common-executable
import: common-options
- ghc-options: -rtsopts -with-rtsopts=-A8m -threaded
+ ghc-options: -rtsopts -with-rtsopts=-H64m

executable pandoc
import: common-executable
@@ -74,6 +74,10 @@ executable pandoc
text
other-modules: PandocCLI.Lua
, PandocCLI.Server
+
+ if arch(wasm32)
+ ghc-options: -optl-Wl,--export=__wasm_call_ctors,--export=hs_init_with_rtsopts,--export=malloc,--export=wasm_main
+
if flag(nightly)
cpp-options: -DNIGHTLY
build-depends: template-haskell,
diff --git a/pandoc-cli/src/pandoc.hs b/pandoc-cli/src/pandoc.hs
index 019d0adedb15..520a858c89a2 100644
--- a/pandoc-cli/src/pandoc.hs
+++ b/pandoc-cli/src/pandoc.hs
@@ -1,5 +1,7 @@
{-# LANGUAGE CPP #-}
+{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
+
{- |
Module : Main
Copyright : Copyright (C) 2006-2024 John MacFarlane
@@ -34,6 +36,13 @@ import qualified Language.Haskell.TH as TH
import Data.Time
#endif

+#if defined(wasm32_HOST_ARCH)
+import Control.Exception
+import Foreign
+import Foreign.C
+import System.IO
+#endif
+
#ifdef NIGHTLY
versionSuffix :: String
versionSuffix = "-nightly-" ++
@@ -44,6 +53,24 @@ versionSuffix :: String
versionSuffix = ""
#endif

+#if defined(wasm32_HOST_ARCH)
+
+foreign export ccall "wasm_main" wasm_main :: Ptr CChar -> Int -> IO ()
+
+wasm_main :: Ptr CChar -> Int -> IO ()
+wasm_main raw_args_ptr raw_args_len =
+ catch act (\(err :: SomeException) -> hPrint stderr err)
+ where
+ act = do
+ args <- words <$> peekCStringLen (raw_args_ptr, raw_args_len)
+ free raw_args_ptr
+ engine <- getEngine
+ res <- parseOptionsFromArgs options defaultOpts "pandoc.wasm" $ args <> ["/in", "-o", "/out"]
+ case res of
+ Left e -> handleOptInfo engine e
+ Right opts -> convertWithOpts engine opts
+#endif
+
main :: IO ()
main = E.handle (handleError . Left) $ do
prg <- getProgName