Variables in text boxes do not get replaced #97

dallanmc · 2022-01-06T11:28:19Z

How To Reproduce:

Create a new blank document.
Create a text box.
Add a variable in the text box.

Outcome after processing: the variable (which exists in the context object) does not get replaced.

The reason:

Docxstamper only processes paragraphs and tables at the root level of the document. Text boxes are within AlternateContent tags and these elements contain paragraphs, but the paragraphs are never processed because AlternateContent is ignored.

The fix:

The main class for this is 'CoordinatesWalker'. Docxstamper only supports paragraphs at the root nodes of the document. But it also gets any tables at the root nodes and iterates through the rows and cells to see if there are any paragraphs, which it then processes (using a 'walkParagraph' method).
This essentially means that it will miss any paragraphs that aren't in the root node and aren't in tables. A text box is an example of a an element that will be skipped.
So I think that the current approach in 'CoordinatesWalker' is probably unnecessary. All we really need to do is iterate through the whole document, grab all the paragraphs elements, stick them in a list and then process them using the 'walkParagraph' method.

Luckily, docx4j already provides a method to get all paragraphs (or any other element for that matter):

public static List<P> getAllParagraphsFromObject(Object obj) {
        ClassFinder finder = new ClassFinder(P.class); // docx4j class
        new TraversalUtil(obj,finder); // docx4j class
        return getParagraphListFromObjectList(finder.results);
    }

    private static List<P> getParagraphListFromObjectList(List<Object> objectList) {
        List<P> paragraphList = new ArrayList<>();
        for (Object object: objectList) {
            if (object instanceof P) {
                paragraphList.add((P)object);
            }
        }

        return paragraphList;
    }

So all we do is get all the paragraphs and run them through the 'walkParagraph' method":

List<P> paragraphs = DocumentWrapper.getParagraphs(document);
        for (P paragraph: paragraphs) {
            //  TODO remove the index, as it's not needed
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, 0);
            walkParagraph(coordinates);
        }

This works perfectly. You might notice that the index is always passed in as zero as the paragraph. coordinates. I haven't had time to refactor this just yet, but in fact, coordinates aren't actually needed at all. I think the original idea behind them was so that you might need them in order to delete elements, but you can delete elements using only the element itself because every element contains a reference to its parent. It's basically this:

((ContentAccessor)paragraphCoordinates.getParagraph().getParent()).getContent().remove(
                    paragraphCoordinates.getParagraph());

Anyway, I hope the above helps someone who encounters the same issue. I am working on a project where I don't have to much control over the input templates, so I don't really have the luxury of telling the user "just don't use text boxes", and besides, there's no good reason why we shouldn't support them.

The text was updated successfully, but these errors were encountered:

tarkal · 2022-05-16T11:17:21Z

This is great. Do you have a copy of the modified CoordinateWalker.class for reference? It looks like the walk() method is called which in turn calls the walkContent() method which in turn calls walkParagraph(). Where exactly are you intercepting this chain with your suggested:

List<P> paragraphs = DocumentWrapper.getParagraphs(document);
        for (P paragraph: paragraphs) {
            //  TODO remove the index, as it's not needed
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, 0);
            walkParagraph(coordinates);
        }

dallanmc · 2022-05-17T08:58:24Z

package org.wickedsource.docxstamper.util.walk;

import org.docx4j.XmlUtils;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.wml.*;
import org.wickedsource.docxstamper.util.DocumentUtil;
import org.wickedsource.docxstamper.api.coordinates.*;

import java.util.List;

public abstract class CoordinatesWalker {

    private WordprocessingMLPackage document;

    public CoordinatesWalker(WordprocessingMLPackage document) {
        this.document = document;
    }

    public void walk() {

        List<P> paragraphs = DocumentUtil.getParagraphsFromObject(document);
        for (P paragraph: paragraphs) {
            walkParagraph(paragraph);
        }
    }
    
    private void walkParagraph(P paragraph){
    	int rowIndex = 0;
    	for (Object contentElement : paragraph.getContent()){
    		 if (XmlUtils.unwrap(contentElement) instanceof R) {
    			 R run = (R) contentElement;
    			 RunCoordinates runCoordinates = new RunCoordinates(run, rowIndex);
    			 onRun(runCoordinates, paragraph);
    		 }
    	}
    	// we run the paragraph afterwards so that the comments inside work before the whole paragraph comments
    	onParagraph(paragraph);

    }

    protected abstract void onParagraph(P paragraph);

    protected abstract void onRun(RunCoordinates runCoordinates, P paragraph);
    
    protected abstract void onTable(TableCoordinates tableCoordinates);

    protected abstract void onTableCell(TableCellCoordinates tableCellCoordinates);

    protected abstract void onTableRow(TableRowCoordinates tableRowCoordinates);

}

tarkal · 2022-05-17T12:08:20Z

Thx. OK I actually ended up with quite a few problems trying to implement the above as it required changing the signatures for quite a few classes / interfaces / abstracts which propagated through the code.

I have not fully tested this solution yet but I have taken your suggested static getAllParagraphsFromObject() method and instead of using like you have above I have just changed the walkContent(List<Object> contentElements) method like this:

private void walkContent(List<Object> contentElements) {

        List<P> paragraphs = getAllParagraphsFromObject(this.document.getMainDocumentPart().getContent());

        int j = 0;
        for (P paragraph : paragraphs) {
            ParagraphCoordinates coordinates = new ParagraphCoordinates(paragraph, j);
            j++;
            walkParagraph(coordinates);
        }

        for (int i = 0; i < contentElements.size(); i++) {
            Object contentElement = contentElements.get(i);
            Object unwrappedObject = XmlUtils.unwrap(contentElement);
            if (unwrappedObject instanceof Tbl) {
                Tbl table = (Tbl) unwrappedObject;
                TableCoordinates tableCoordinates = new TableCoordinates(table, i);
                walkTable(tableCoordinates);
            }
        }
    }

public static List<P> getAllParagraphsFromObject(Object obj) {
        ClassFinder finder = new ClassFinder(P.class); // docx4j class
        new TraversalUtil(obj,finder); // docx4j class
        return getParagraphListFromObjectList(finder.results);
}

 private static List<P> getParagraphListFromObjectList(List<Object> objectList) {
        List<P> paragraphList = new ArrayList<>();
        for (Object object: objectList) {
            if (object instanceof P) {
                paragraphList.add((P)object);
            }
        }

        return paragraphList;
}

Initial tests seem to work well. I'll update after fully testing.

dallanmc · 2022-05-17T12:17:00Z

I actually can't remember now, but do you definitely need the bit where you determine if it's a table and then call walkTable? I think that table cells contain paragraphs, so they should be included in the getAllParagraphsFromObject.

tarkal · 2022-05-17T14:19:04Z

You could be absolutely right. I will complete my testing and report back. Thx

t-oster · 2022-09-21T13:01:28Z

@dallanmc would you mind creating a pull reqest (against verronpro's version https://github.com/verronpro/docx-stamper), so we can get a build on maven central supporting text boxes AND recursion?

dallanmc · 2022-09-21T15:40:40Z

I'll try to get to this at some stage. I think moving to the getParagraphs(...) has a few knock-on implications for the rest of the code. But in my view, it's much better to go with the getParagraphs approach.

t-oster · 2022-09-22T19:35:33Z

see verronpro/docx-stamper#36

t-oster mentioned this issue Sep 21, 2022

Replacing text in text box #103

Open

This was referenced Sep 22, 2022

Publish to Maven Central verronpro/docx-stamper#35

Closed

Feature: Replace text in text-boxes verronpro/docx-stamper#36

Merged

caring-coder mentioned this issue Sep 23, 2022

Variables in text boxes do not get replaced verronpro/docx-stamper#40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variables in text boxes do not get replaced #97

Variables in text boxes do not get replaced #97

dallanmc commented Jan 6, 2022 •

edited

Loading

tarkal commented May 16, 2022 •

edited

Loading

dallanmc commented May 17, 2022 •

edited

Loading

tarkal commented May 17, 2022 •

edited

Loading

dallanmc commented May 17, 2022

tarkal commented May 17, 2022

t-oster commented Sep 21, 2022

dallanmc commented Sep 21, 2022

t-oster commented Sep 22, 2022

Variables in text boxes do not get replaced #97

Variables in text boxes do not get replaced #97

Comments

dallanmc commented Jan 6, 2022 • edited Loading

How To Reproduce:

The reason:

The fix:

tarkal commented May 16, 2022 • edited Loading

dallanmc commented May 17, 2022 • edited Loading

tarkal commented May 17, 2022 • edited Loading

dallanmc commented May 17, 2022

tarkal commented May 17, 2022

t-oster commented Sep 21, 2022

dallanmc commented Sep 21, 2022

t-oster commented Sep 22, 2022

dallanmc commented Jan 6, 2022 •

edited

Loading

tarkal commented May 16, 2022 •

edited

Loading

dallanmc commented May 17, 2022 •

edited

Loading

tarkal commented May 17, 2022 •

edited

Loading