Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CiceroMark<->OOXML transformers #397

Open
9 of 11 tasks
K-Kumar-01 opened this issue May 27, 2021 · 12 comments
Open
9 of 11 tasks

CiceroMark<->OOXML transformers #397

K-Kumar-01 opened this issue May 27, 2021 · 12 comments
Assignees

Comments

@K-Kumar-01
Copy link
Collaborator

K-Kumar-01 commented May 27, 2021

Feature Request 🛍️

Incorporating the CiceroMark->OOXML transformer and improving the currently implemented OOXML->CiceroMark transformer.

Use Case

It will allow the conversion of Docx files into CiceroMark JSON and vice versa. In addition, using this Docx files can be converted into different other formats like PDF or HTML as well.

Possible Solution

The transformer for OOXML->CiceroMark is already implemented. It needs to be updated with some entities to allow full transformation.

The transformer for CiceroMark->OOXML is created in cicero-word-add-in branch. It needs to be transferred/transported here with some changes to allow the transformation.

Detailed Description

Currently, the CiceroMark->OOXML transformer supports the following conversions:

CiceroMark Entity OOXML Entity
Text <w:t>
Paragraph <w:p>Content</w:p>
Linebreak <w:p/>
Softbreak <w:r><w:sym></w:r>
Emph <w:i>
Variable <w:sdt>
List Block/ List <w:numPr><w:num w:val={ordered/unordered}/></w:numPr>
List Item(Text) <w:t/>
List Item(Variable) <w:sdt/>

Conversions which are left:

  • Strong

  • Code

  • Link

  • Image

  • BlockQuote

  • CodeBlock

  • ThematicBreak

  • Clause

  • Optional

  • Conditional

  • Formula / Ergo expressions

In the left conversions, we need to decide which ones need major importance/priority and which can be given a lower priority. Furthermore, we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)

Entities and their corresponding Ciceromark

Heading

        {
          "$class": "org.accordproject.commonmark.Heading",
          "level": "2",
          "nodes": [ ... ]
        },
Paragraph

        {  "$class": "org.accordproject.commonmark.Paragraph",
          "nodes": [
            ...
            } }
Text

             { "$class": "org.accordproject.commonmark.Text",
              "text": "Try TemplateMark" }
Softbreak "$class": "org.accordproject.commonmark.Softbreak"
Variable

{
          "$class": "org.accordproject.ciceromark.Variable",
          "value": "\"Widgets\"",
          "name": "deliverable",
          "elementType": "String"
        },
Link

              "$class": "org.accordproject.commonmark.Link",
              "destination": "https://github.com/accordproject/markdown-transform",
              "title": "",
              "nodes": [
                {
                  "$class": "org.accordproject.commonmark.Text",
                  "text": "@accordproject/markdown-transform"
                }
              ]
Image
"$class": "org.accordproject.commonmark.Image",
"destination": "https://github.com/accordproject/markdown-transform",
"title": "",
"nodes": [
    {
      "$class": "org.accordproject.commonmark.Text",
      "text": "@accordproject/markdown-transform"
    }
]
Thematic Break
"$class": "org.accordproject.commonmark.ThematicBreak"
Emphasis
 "$class": "org.accordproject.commonmark.Emph",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "They can also, of course, contain "
  }
]
Strong
 "$class": "org.accordproject.commonmark.Strong",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "markdown"
  }
]
Code
{
  "$class": "org.accordproject.commonmark.Code",
  "text": "hello"
}
CodeBlock
{
  "$class": "org.accordproject.commonmark.CodeBlock",
  "text": "testing purposes\n"
}
BlockQuote
{
    "$class": "org.accordproject.commonmark.BlockQuote",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Paragraph",
        "nodes": [
          {
            "$class": "org.accordproject.commonmark.Text",
            "text": "First line"
          }
        ]
      }
    ]
  }
Ordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "ordered",
"start": "1",
"tight": "true",
"delimiter": "period",
"nodes": [...]
}
Unordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "bullet",
"tight": "true",
"nodes": [...]
}
ListItem
{
"$class": "org.accordproject.commonmark.Item",
"nodes": [...]
}
Conditional
{
"$class": "org.accordproject.ciceromark.Conditional",
"whenTrue": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This is a force majeure"
  }
],
"whenFalse": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This is "
  },
  {
    "$class": "org.accordproject.commonmark.Emph",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Text",
        "text": "not"
      }
    ]
  },
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": " a force majeure"
  }
],
"name": "forceMajeure"
}
Optional
{
"$class": "org.accordproject.ciceromark.Optional",
"whenSome": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This applies except for Force Majeure cases in a "
  },
  {
    "$class": "org.accordproject.templatemark.VariableDefinition",
    "name": "miles"
  },
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": " miles radius."
  }
],
"whenNone": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This applies even in case a force majeure."
  }
],
"name": "forceMajeure"
}
Clause
{
"$class": "org.accordproject.ciceromark.Clause",
"name": "clauseName",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Paragraph",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Text",
        "text": "...Markdown of the clause..."
      }
    ]
  }
]
}
Formula
{
  "$class": "org.accordproject.templatemark.Formula",
  "dependencies": [],
  "code": " formulas ",
  "name": "formula_8e04633f576f94d0333aa7cb5a60f69edb9828f3eab05c59db02d2baa56ab685"
}

Entities and their corresponding OOXML Tag

Heading

<w:pPr>
  <w:pStyle w:val="${definedLevels[level].style}"/>
</w:pPr>
<w:r>
  <w:rPr>
    <w:sz w:val="${definedLevels[level].size * 2}"/>
  </w:rPr>
  <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>

Emphasis

<w:r>
    <w:rPr>
        <w:i />
    </w:rPr>
    <w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>

Strong

<w:r>
    <w:rPr>
        <w:b />
        <w:bCs /<
    </w:rPr>
    <w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>

Text

<w:r>
    <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>

Paragraph

<w:p>
    value
</w:p>

Softbreak

<w:r>
  <w:sym w:font="Calibri" w:char="2009" />
</w:r>

Variable

<w:sdt>
  <w:sdtPr>
    <w:rPr>
      <w:sz w:val="24"/>
    </w:rPr>
    <w:alias w:val="${titleGenerator(title, type)}"/>
    <w:tag w:val="${tag}"/>
  </w:sdtPr>
  <w:sdtContent>
    <w:r>
      <w:rPr>
        <w:sz w:val="24"/>
      </w:rPr>
      <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
    </w:r>
  </w:sdtContent>
</w:sdt>
@algomaster99
Copy link
Member

Formula / Ergo expressions
Clause
Conditional
Optional
Strong
BlockQuote
CodeBlock
Code
ThematicBreak
Link
Image

Create a checkbox for this. It will be easier to track.

we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)

Do raise this up in the meeting tomorrow :)

Since you have created the table, don't forget to link this to the next wiki page you will write.

@algomaster99
Copy link
Member

Entities and their corresponding Ciceromark

Will you also be maintaining the OOXML counterpart? I recommend so.

@K-Kumar-01
Copy link
Collaborator Author

Entities and their corresponding Ciceromark

Will you also be maintaining the OOXML counterpart? I recommend so.

Yeah, I will also maintain them as we start writing the transformer for it.

@algomaster99 algomaster99 mentioned this issue Jun 10, 2021
5 tasks
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 10, 2021
Rules and helpers for the conversion
Complete roundtripping for text and emphasis(CiceroMark->OOXML->CiceroMark)
Test for the above

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 10, 2021
Naming changed to improve readability and understandability
Files renamed for better understanding

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
Documentation update

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
Declares OOXML as instance variable

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
Variable name changed
Spacing improved

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
Remove attribute from <w:i> ooxml tag

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
OOXML spacing changes

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
K-Kumar-01 added a commit to K-Kumar-01/markdown-transform that referenced this issue Jun 11, 2021
refactor: formattingg rules - accordproject#397
Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit that referenced this issue Jun 12, 2021
Heading transformer for CiceroMark->OOXML
Roundtrip(CiceroMark <-> OOXML) test for above

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Jun 12, 2021
@K-Kumar-01
Copy link
Collaborator Author

@algomaster99 @dselman
A doubt regarding the variable transformer.
Screenshot from 2021-06-16 17-12-25
When we rendered in the add-in, some variables had extra quotation marks. For more reference see here. The quotation marks were removed.

The tests now require these quotation marks in the transformer, as shown in the above image(test failing if no quotation marks).

So how to proceed here?
I have come on two solutions:

  1. Remove the stripping off quotation marks altogether (rendering the probably unnecessary quotes).
  2. Remove the stripping off quotation marks in the transformer to make it pass the test. The problem here can be if we convert from CiceroMark->OOXML via the transformer or markus-cli (possibly in the future) and then open the file in the add-in, we will have double quotes in variable values. In addition, if we add another template using the add-in, those values won't have double quotes.

PS: I am not a fan of either approach.

@algomaster99
Copy link
Member

Where did you get the CiceroMark for testing? If you got it from a latest template, change the variable transformer to enclose variables in "".

@K-Kumar-01
Copy link
Collaborator Author

@algomaster99
I got it from acceptance-of-delivery.json.
As for variable transformer do you mean CiceroMarkToOOXML or OOXMLtoCiceroMark.
We strip off quotation marks in the former one.
Also, enclosing variable values in "" in the latter won't make sense as variables which are of type DateTime, Number, etc don't have quotes around them.

@algomaster99
Copy link
Member

algomaster99 commented Jun 16, 2021

First of all, that acceptance-of-delivery.json is parsed from an older version of the template. You might want to update that. Next, if the latest parsed CiceroMark still encloses its variables in "", you don't need to add extra code for enclosing the value in "" or stripping them off "" because when you will be iterating through the CiceroMark, you will get the value as "\"Party A\"" and not "Party A". But if does not enclose the value in "", again, you don't need to add extra code because you will get "Party A" as the value only and not "\"Party A\"".

Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.

@K-Kumar-01
Copy link
Collaborator Author

@algomaster99

Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.

The main reason was that some variable values had ""wrapped and some not. The ""was wrapped around basically in strings. So I thought it might have been unintentional and that they could have been inserted while transforming by mistake.

@K-Kumar-01
Copy link
Collaborator Author

@algomaster99
I will keep the values as it is in the transformers

@algomaster99
Copy link
Member

Exactly my point :)

K-Kumar-01 added a commit that referenced this issue Jun 16, 2021
Add helpers and rule
Add the transformation logic
Add the test for above

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Jun 17, 2021
algomaster99 pushed a commit that referenced this issue Jun 17, 2021
algomaster99 pushed a commit that referenced this issue Jun 17, 2021
K-Kumar-01 added a commit that referenced this issue Jun 18, 2021
Softbreak Rule
Check for softbreak in transformer and call rule
Test for above(acceptance-of-delivery.json)

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit that referenced this issue Jun 18, 2021
algomaster99 pushed a commit that referenced this issue Jun 18, 2021
K-Kumar-01 added a commit that referenced this issue Jun 18, 2021
use getClass as helper instead of transformer class function

Signed-off-by: k-kumar-01 <[email protected]>
@algomaster99
Copy link
Member

@K-Kumar-01 from now, don't mention issue numbers in every commit you push. As you can see, it has cluttered this PR. I think mentioning the issue number should only be necessary for the PR title as your PRs are squashed anyway.

@K-Kumar-01
Copy link
Collaborator Author

@algomaster99 @dselman
There is no such thing as blockquote in ms-word. It basically involves styling a given paragraph. Reference videos, Styling Reference for shading.

So from these, can you decide some specifications for the blockquote which we need?

Also, for `inline code, I am thinking of this formatting:
Screenshot (70)

Thanks in advance:)

algomaster99 pushed a commit that referenced this issue Jun 30, 2021
…397 (#418)

* feat(markdown-docx): text and emphasis transformer

Remove old transformer
Rules: EMPHASIS, TEXT, TEXT_STYLES, TEXT_WRAPPER, PARAGRAPH_WRAPPER
Tests: Check only for text-and-emphasis using if

Signed-off-by: k-kumar-01 <[email protected]>

* feat: heading transformer

Logic to transform headings
Rule: PARAGRAPH_PROPERTIES_RULE
Hardcore check the test for heading using condition

Signed-off-by: K-Kumar-01 <[email protected]>

* feat: variable transformer

Logic to transform variables
Rules: Variable Rule
Conditionally check for tests

Signed-off-by: K-Kumar-01 <[email protected]>

* feat: softbreak transformer

Logic for softbreak transformation
Rule: SOFTBREAK_RULE
Conditionally exclude the test for strong(file:strong.json)

Signed-off-by: K-Kumar-01 <[email protected]>

* feat: strong transformer

Logic for strong transformation
Rule: STRONG_RULE
Check for all tests

Signed-off-by: K-Kumar-01 <[email protected]>

* feat: add headingStyles and relationship specs

Signed-off-by: K-Kumar-01 <[email protected]>

* refactor(markdown-docx): coding practices

Spread operator use inplace of push
Define constants as per convention
Add test for nesting

Signed-off-by: K-Kumar-01 <[email protected]>

* refactor: Heading Properties

remove argument value from Rule
remove the condition to insert runtime properties in transformer

Signed-off-by: K-Kumar-01 <[email protected]>
algomaster99 added a commit that referenced this issue Jul 8, 2021
Signed-off-by: K-Kumar-01 <[email protected]>

Signed-off-by: Aman Sharma <[email protected]>

Co-authored-by: Aman Sharma <[email protected]>
K-Kumar-01 added a commit that referenced this issue Jul 11, 2021
Transformer(CiceroMark<->OOXML)
Tests for the same

Signed-off-by: K-Kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Jul 18, 2021
K-Kumar-01 added a commit that referenced this issue Jul 24, 2021
Add transformation logic: OOXML <-> CiceroMark
Add rules
Add test

Signed-off-by: k-kumar-01 <[email protected]>
K-Kumar-01 added a commit that referenced this issue Jul 26, 2021
K-Kumar-01 added a commit that referenced this issue Jul 30, 2021
transformation logic(OOXML<->CiceroMark)
rules for clause
tests

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 3, 2021
K-Kumar-01 added a commit that referenced this issue Aug 3, 2021
transformation logic(OOXML<->CiceroMark)
rules for clause
tests

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 4, 2021
K-Kumar-01 added a commit that referenced this issue Aug 5, 2021
transformation logic: OOXML<->CiceroMark
rules: link and link property
tests: link and its variants

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 9, 2021
K-Kumar-01 added a commit that referenced this issue Aug 10, 2021
transformation logic:OOXML<->CiceroMark
tests for optional
update tests for other(xml changes in variable and clause)

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 17, 2021
K-Kumar-01 added a commit that referenced this issue Aug 18, 2021
transformation logic: OOXML<->CiceroMark
rules: OPTIONAL_RULE
tests

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 18, 2021
K-Kumar-01 added a commit that referenced this issue Aug 18, 2021
Transformation logic(OOXML<->CiceroMark)
Tests for formula
Update tests(markdown-transform, markdown-cli)

Signed-off-by: k-kumar-01 <[email protected]>
algomaster99 pushed a commit that referenced this issue Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants