diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 5560b24..189dc3d 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -61,7 +61,7 @@ representative at an online or offline event. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at -[https://kiranparajuli.com.np][contact method]. +[https://kiranparajuli.com.np](https://kiranparajuli.com.np). All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the diff --git a/README.md b/README.md index 20b783b..bc70042 100644 --- a/README.md +++ b/README.md @@ -1,122 +1,110 @@

HtmlMark

-

+

HtmlMark Logo

A very lightweight Markdown Parser powered by Regex

-## 🔑 Key points +## 🔑 Why HtmlMark? - no use of external dependencies - lexer: to produce markdown tokens -- parser: to produce the HTML code +- parser: to produce the HTML code from the lexer +- code highlighting support +- support for front matter ## 🌐 Demo Checkout the features of the parser from this [Live Demo](https://kiranparajuli589.github.io/htmlmark/ 'Live Demo'). -## 🛣️ Roadmap - -### 🪕 Paragraphs -1. **Heading:** - - Levels: 1 to 6 - - Type - - Underlined: - - For Heading 1, the underline of `=` - - For Heading 2, the underline of `-` - - - Hashed - - Number of the leading hashes determines the level of the heading. - - Can be in between of 1 to 6. - - Can be fenced with hashes. - - Otherwise, it will be treated as a paragraph. -2. **Code block:** - - Language: Optional - - Types: - 1. Can be inside a backtick block like ``` - 2. An indent of 4 can make the block a code - - For a codeblock inside a list - it should be indented at least twice. -3. **List:** - - Ordered: `{any digit}. Item 1` - - Unordered: `-|+|* Item 1` - - Checklist: `-|+|*|{digit}. [ ] Item 1` (Can be _ordered_ or _unordered_) - - Can be Lazy - - Items must have the same intent to be included in the same list - - Allows other _paragraph items_ inside content (recursive lex and parsing) -4. **Quote:** - - Levels: 0 to infinity - - Lines must be the same indent to be included within the same quote - - Allows other _paragraph items_ inside content (recursive lex and parsing) - - Can be Lazy -5. **Image:** - - Link: String (Required) - - Alt text: String (Required) - - Title: String (Optional) - - Width: Number (Optional) - - Height: Number (Optional) - - Indentation: NOT IMPLEMENTED YET -6. **Comment:** - - Lexer contains it - - Parser also contains it - - Example: `` -7. **Line:** - - Defined as: `---` - - Consecutive lines are merged into one -8. **Table:** - - Equal number of cell counts - - Equal number of indentations - - Cell content should allow emphasis - - Table heading is optional - - Table heading is separated by `|---|,|:--:|` -9. **Newline:** - - Consecutive newlines are merged into one -10. **Paragraph:** - - Anything else - - Line Breaks: - - 1. If a line ends with 2 or more than 2 spaces, then, a line break is inserted. - 2. Otherwise, the lines are merged into one. -11. **HTML:** - - Lexer contains it - - Parser also contains it - - No escaping for HTML tags - - Parsed as it is -12. **Front Matter:** - - Only lexer contains it - - Starts and Ends with `---` (it can be surrounded by whitespaces, but should have exactly 3 dashes without spaces in between) - - If array or object are provided in exact JSON literal format, then they are parsed as JS objects and arrays - - Otherwise, they are parsed as strings - - Example: `---\ntags: ["a", "b", "c"]\n---` is parsed as `{tags: ["a", "b", "c"]}` - - -### 🎺 Emphasis -Emphasis can be inside the content of any paragraph types. Even emphasis items can have emphasis inside 🤩. - -1. **Bold:** wrapped inside `**` | `__` | odd number of `*` | `_` -2. **Italic:** wrapped inside `*` | `_` | even number of `*` | `_` -3. **Code:** wrapped inside backticks -4. **Strike:** wrapped inside `~~` | even number of `~` -5. **Underline:** wrapped inside `++` | even number of `+` -6. **Link:** wrapped as `[title](url 'title')` where `title` is optional -7. **Image:** wrapped as `![alt text](url 'title' width height)` where `title`, `width` & `height` are optional - -### 🛹 Escaping -1. Escaping is done by using `\` before the character to be escaped. -2. If you need text like `# text` but don't want it to be treated as a heading, then you can escape it as `\# text -3. Escaping is done for the following characters: - - `*`, `_`, `[`, `]`, `(`, `)`, `!`, `~`, `+`, `<`, `>`, `&`, `"`, `'` -4. Nothing is escaped in the lexer (content wise) -5. Everything is escaped inside of `code` and `codeblock` -6. Non HTML characters are escaped inside of other tokens - -Ref: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references - -### 👻 HTML Sanitization -**A BIG TODO** +## 🎠 Installation + +```bash +npm i htmlmark +``` + +## 💠 Usage + +```js +import HtmlMark from 'htmlmark'; + +const opts = { + indent: 4, + highlightFn: (code, lang) => { + // return the highlighted code + }, + useLinkRefs: true +}; + +const htmlmark = new HtmlMark(opts); + +htmlmark.tokenize("## Hello World"); // returns the tokens +htmlmark.parse("## Hello World"); // returns the HTML code +``` + +## 🎡 Options + +| Option | Type | Default | Description | +|-------------|------------|-------------|---------------------------------------------------| +| indent | `number` | `4` | Number of spaces (or tabs) to use for indentation | +| highlightFn | `function` | `undefined` | Function to highlight the code | +| useLinkRefs | `boolean` | `true` | Whether to use link references or not | + + +## 🎢 APIs +- `tokenize(markdown: string): Token[]`: + Returns the lexer from the markdown string +- `parse(markdown: string): string`: + Returns the HTML code from the markdown string +- `getFrontMatter(markdown: string): FrontMatter{}`: + Returns the front matter from the markdown string + +### Lexer +The provided markdown string is scanned line by line and checked against various regex patterns to produce the accurate markdown tokens. A general token structure is as: + +```json +{ + "indent": 0, + "level": 1, + "raw": "# Heading One Text", + "setext": false, + "type": "heading", + "value": "Heading One Text", + "tokens": [{ + "raw": "Heading One Text", + "type": "text", + "value": "Heading One Text" + }] +} +``` + +### Front Matter +The front matter is the metadata of the markdown file. It is written in the YAML format and is separated from the markdown content by a line containing three hyphens `---`. It must be placed at the top of the markdown file. + +#### Example: + +```md +--- +title: Hello World +date: 2021-01-01 +author: John Doe +--- + +## Hello World +Lorem ipsum dollar sit amet +``` + +The above markdown file will produce the following front matter: + +```json +{ + "title": "Hello World", + "date": "2021-01-01", + "author": "John Doe" +} +``` + +## 💁 Contributing to HtmlMark +Contributions are always welcome, no matter how large or small. Before contributing, please read the [code of conduct](https://github.com/kiranparajuli589/htmlmark/blob/main/CODE_OF_CONDUCT.md 'code of conduct'). You can also find the development guide [here](https://github.com/kiranparajuli589/htmlmark/blob/main/CONTRIBUTING.md 'here'). + +## 📝 License +GNU GENERAL PUBLIC LICENSE v3.0 © [Kiran Parajuli](https://kiranparajuli.com.np 'Kiran Parajuli')