Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for empty cells in table extraction (xml) #633

Open
fortyfourforty opened this issue Jun 27, 2024 · 3 comments
Open

Account for empty cells in table extraction (xml) #633

fortyfourforty opened this issue Jun 27, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@fortyfourforty
Copy link

fortyfourforty commented Jun 27, 2024

Hi,

Another inaccuracy issue in XML extraction for tables.

If the table contains one or more empty cells, the XML simply ignore it. For example, it makes a table with 3 row into 2 row.

<table>
<row span="3">
<cell>a</cell>
<cell>b</cell>
</row>
<row span="3">
<cell>f</cell>
<cell>s</cell>
<cell>s</cell>
</row>
<row>
<cell>g</cell>
<cell>b</cell>
</row>
</table>

It's better to extract empty cells as empty string or None to keep the layout correct.

<table>
<row span="3">
<cell>a</cell>
<cell></cell>
<cell>b</cell>
</row>
<row span="3">
<cell>f</cell>
<cell>s</cell>
<cell>s</cell>
</row>
<row>
<cell>g</cell>
<cell>b</cell>
<cell>None</cell>
</row>
</table>
@adbar adbar added the enhancement New feature or request label Jun 27, 2024
@adbar
Copy link
Owner

adbar commented Jun 27, 2024

It's not a bug in itself be I agree things could be improved, do you want to work on a PR?

@adbar adbar changed the title table extraction in xml Account for empty cells in table extraction (xml) Jun 27, 2024
@fortyfourforty
Copy link
Author

I wish I could but my little, self-taught knowledge of Python and GitHub does not allow me to get my hands on PRs. 😞

@adbar
Copy link
Owner

adbar commented Jul 25, 2024

@naktinis You wrote code targeting tables, maybe you are also interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants