Account for empty cells in table extraction (xml) #633

fortyfourforty · 2024-06-27T08:43:01Z

Hi,

Another inaccuracy issue in XML extraction for tables.

If the table contains one or more empty cells, the XML simply ignore it. For example, it makes a table with 3 row into 2 row.

<table>
<row span="3">
<cell>a</cell>
<cell>b</cell>
</row>
<row span="3">
<cell>f</cell>
<cell>s</cell>
<cell>s</cell>
</row>
<row>
<cell>g</cell>
<cell>b</cell>
</row>
</table>

It's better to extract empty cells as empty string or None to keep the layout correct.

<table>
<row span="3">
<cell>a</cell>
<cell></cell>
<cell>b</cell>
</row>
<row span="3">
<cell>f</cell>
<cell>s</cell>
<cell>s</cell>
</row>
<row>
<cell>g</cell>
<cell>b</cell>
<cell>None</cell>
</row>
</table>

The text was updated successfully, but these errors were encountered:

adbar · 2024-06-27T10:51:12Z

It's not a bug in itself be I agree things could be improved, do you want to work on a PR?

fortyfourforty · 2024-06-27T11:03:41Z

I wish I could but my little, self-taught knowledge of Python and GitHub does not allow me to get my hands on PRs. 😞

adbar · 2024-07-25T12:00:37Z

@naktinis You wrote code targeting tables, maybe you are also interested.

adbar added the enhancement New feature or request label Jun 27, 2024

adbar changed the title ~~table extraction in xml~~ Account for empty cells in table extraction (xml) Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for empty cells in table extraction (xml) #633

Account for empty cells in table extraction (xml) #633

fortyfourforty commented Jun 27, 2024 •

edited

Loading

adbar commented Jun 27, 2024

fortyfourforty commented Jun 27, 2024

adbar commented Jul 25, 2024

Account for empty cells in table extraction (xml) #633

Account for empty cells in table extraction (xml) #633

Comments

fortyfourforty commented Jun 27, 2024 • edited Loading

adbar commented Jun 27, 2024

fortyfourforty commented Jun 27, 2024

adbar commented Jul 25, 2024

fortyfourforty commented Jun 27, 2024 •

edited

Loading