Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables in text boxes do not get replaced #97

Open
dallanmc opened this issue Jan 6, 2022 · 8 comments
Open

Variables in text boxes do not get replaced #97

dallanmc opened this issue Jan 6, 2022 · 8 comments

Comments

@dallanmc
Copy link

dallanmc commented Jan 6, 2022

How To Reproduce:

  • Create a new blank document.
  • Create a text box.
  • Add a variable in the text box.

Outcome after processing: the variable (which exists in the context object) does not get replaced.

The reason:

Docxstamper only processes paragraphs and tables at the root level of the document. Text boxes are within AlternateContent tags and these elements contain paragraphs, but the paragraphs are never processed because AlternateContent is ignored.

The fix:

The main class for this is 'CoordinatesWalker'. Docxstamper only supports paragraphs at the root nodes of the document. But it also gets any tables at the root nodes and iterates through the rows and cells to see if there are any paragraphs, which it then processes (using a 'walkParagraph' method).
This essentially means that it will miss any paragraphs that aren't in the root node and aren't in tables. A text box is an example of a an element that will be skipped.
So I think that the current approach in 'CoordinatesWalker' is probably unnecessary. All we really need to do is iterate through the whole document, grab all the paragraphs elements, stick them in a list and then process them using the 'walkParagraph' method.

Luckily, docx4j already provides a method to get all paragraphs (or any other element for that matter):

public static List<P> getAllParagraphsFromObject(Object obj) {
        ClassFinder finder = new ClassFinder(P.class); // docx4j class
        new TraversalUtil(obj,finder); // docx4j class
        return getParagraphListFromObjectList(finder.results);
    }

    private static List<P> getParagraphListFromObjectList(List<Object> objectList) {
        List<P> paragraphList = new ArrayList<>();
        for (Object object: objectList) {
            if (object instanceof P) {
                paragraphList.add((P)object);
            }
        }

        return paragraphList;
    }

So all we do is get all the paragraphs and run them through the 'walkParagraph' method":

List<P> paragraphs = DocumentWrapper.getParagraphs(document);
        for (P paragraph: paragraphs) {
            //  TODO remove the index, as it's not needed
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, 0);
            walkParagraph(coordinates);
        }

This works perfectly. You might notice that the index is always passed in as zero as the paragraph. coordinates. I haven't had time to refactor this just yet, but in fact, coordinates aren't actually needed at all. I think the original idea behind them was so that you might need them in order to delete elements, but you can delete elements using only the element itself because every element contains a reference to its parent. It's basically this:

((ContentAccessor)paragraphCoordinates.getParagraph().getParent()).getContent().remove(
                    paragraphCoordinates.getParagraph());

Anyway, I hope the above helps someone who encounters the same issue. I am working on a project where I don't have to much control over the input templates, so I don't really have the luxury of telling the user "just don't use text boxes", and besides, there's no good reason why we shouldn't support them.

@tarkal
Copy link

tarkal commented May 16, 2022

This is great. Do you have a copy of the modified CoordinateWalker.class for reference? It looks like the walk() method is called which in turn calls the walkContent() method which in turn calls walkParagraph(). Where exactly are you intercepting this chain with your suggested:

List<P> paragraphs = DocumentWrapper.getParagraphs(document);
        for (P paragraph: paragraphs) {
            //  TODO remove the index, as it's not needed
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, 0);
            walkParagraph(coordinates);
        }

@dallanmc
Copy link
Author

dallanmc commented May 17, 2022

package org.wickedsource.docxstamper.util.walk;

import org.docx4j.XmlUtils;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.wml.*;
import org.wickedsource.docxstamper.util.DocumentUtil;
import org.wickedsource.docxstamper.api.coordinates.*;

import java.util.List;

public abstract class CoordinatesWalker {

    private WordprocessingMLPackage document;

    public CoordinatesWalker(WordprocessingMLPackage document) {
        this.document = document;
    }

    public void walk() {

        List<P> paragraphs = DocumentUtil.getParagraphsFromObject(document);
        for (P paragraph: paragraphs) {
            walkParagraph(paragraph);
        }
    }
    
    private void walkParagraph(P paragraph){
    	int rowIndex = 0;
    	for (Object contentElement : paragraph.getContent()){
    		 if (XmlUtils.unwrap(contentElement) instanceof R) {
    			 R run = (R) contentElement;
    			 RunCoordinates runCoordinates = new RunCoordinates(run, rowIndex);
    			 onRun(runCoordinates, paragraph);
    		 }
    	}
    	// we run the paragraph afterwards so that the comments inside work before the whole paragraph comments
    	onParagraph(paragraph);

    }

    protected abstract void onParagraph(P paragraph);

    protected abstract void onRun(RunCoordinates runCoordinates, P paragraph);
    
    protected abstract void onTable(TableCoordinates tableCoordinates);

    protected abstract void onTableCell(TableCellCoordinates tableCellCoordinates);

    protected abstract void onTableRow(TableRowCoordinates tableRowCoordinates);

}

@tarkal
Copy link

tarkal commented May 17, 2022

Thx. OK I actually ended up with quite a few problems trying to implement the above as it required changing the signatures for quite a few classes / interfaces / abstracts which propagated through the code.

I have not fully tested this solution yet but I have taken your suggested static getAllParagraphsFromObject() method and instead of using like you have above I have just changed the walkContent(List<Object> contentElements) method like this:

private void walkContent(List<Object> contentElements) {

        List<P> paragraphs = getAllParagraphsFromObject(this.document.getMainDocumentPart().getContent());

        int j = 0;
        for (P paragraph : paragraphs) {
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, j);
            j++;
            walkParagraph(coordinates);
        }

        for (int i = 0; i < contentElements.size(); i++) {
            Object contentElement = contentElements.get(i);
            Object unwrappedObject = XmlUtils.unwrap(contentElement);
            if (unwrappedObject instanceof Tbl) {
                Tbl table = (Tbl) unwrappedObject;
                TableCoordinates tableCoordinates = new TableCoordinates(table, i);
                walkTable(tableCoordinates);
            }
        }
    }

public static List<P> getAllParagraphsFromObject(Object obj) {
        ClassFinder finder = new ClassFinder(P.class); // docx4j class
        new TraversalUtil(obj,finder); // docx4j class
        return getParagraphListFromObjectList(finder.results);
}

 private static List<P> getParagraphListFromObjectList(List<Object> objectList) {
        List<P> paragraphList = new ArrayList<>();
        for (Object object: objectList) {
            if (object instanceof P) {
                paragraphList.add((P)object);
            }
        }

        return paragraphList;
}

Initial tests seem to work well. I'll update after fully testing.

@dallanmc
Copy link
Author

I actually can't remember now, but do you definitely need the bit where you determine if it's a table and then call walkTable? I think that table cells contain paragraphs, so they should be included in the getAllParagraphsFromObject.

@tarkal
Copy link

tarkal commented May 17, 2022

You could be absolutely right. I will complete my testing and report back. Thx

@t-oster
Copy link

t-oster commented Sep 21, 2022

@dallanmc would you mind creating a pull reqest (against verronpro's version https://github.com/verronpro/docx-stamper), so we can get a build on maven central supporting text boxes AND recursion?

@dallanmc
Copy link
Author

I'll try to get to this at some stage. I think moving to the getParagraphs(...) has a few knock-on implications for the rest of the code. But in my view, it's much better to go with the getParagraphs approach.

@t-oster
Copy link

t-oster commented Sep 22, 2022

see verronpro/docx-stamper#36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants