Skip to content

Oreo Parsing Bug #19

@braceal

Description

@braceal

How did you install pdfwf?

See Readme.

What version of pdfwf are you using?

0.1.4 oreo_debug branch

Describe the problem.

parse raised an exception: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 374, in __getitem__
    page = self.current_doc_doc[rel_page_idx]
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/__init__.py", line 2593, in __getitem__
    return self.load_page(i)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/__init__.py", line 4734, in load_page
    page = mupdf.fz_load_page(self.this, page_id)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/fitz/mupdf.py", line 39348, in fz_load_page
    return _mupdf.fz_load_page(doc, number)
RuntimeError: code=2: cannot find page 17 in page tree

Second bug:

Traceback (most recent call last):
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/oreo.py", line 282, in parse
    ) = get_packed_patch_tensor(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 2102, in get_packed_patch_tensor
    packed_patches_and_indices = [
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 2103, in <listcomp>
    get_packed_patch_list(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1648, in get_packed_patch_list
    merge_patches_into_row(
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1350, in merge_patches_into_row
    [
  File "/lus/eagle/projects/FoundEpidem/braceal/projects/metric-rag/src/pdfwf/pdfwf/parsers/oreo/tensor_utils.py", line 1351, in <listcomp>
    F.pad(patch, (0, 0, row_height - patch.size()[1], 0), value=1.0)
  File "/lus/eagle/projects/CVD-Mol-AI/braceal/conda/envs/pdfwf/lib/python3.10/site-packages/torch/nn/functional.py", line 4495, in pad
    return torch._C._nn.pad(input, pad, mode, value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions