Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images not available in markdown from URL #1205

Open
FaFre opened this issue Mar 19, 2025 · 1 comment
Open

Images not available in markdown from URL #1205

FaFre opened this issue Mar 19, 2025 · 1 comment
Assignees
Labels
bug Something isn't working html issue related to html backend

Comments

@FaFre
Copy link

FaFre commented Mar 19, 2025

Bug

It seems like it is not possible to capture images from URL with any image-export-mode. The error assumes the usage of the PDF pipeline that is not the case, because the input is HTML.

# Stuttgart

aus Wikipedia, der freien Enzyklopädie

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Der Titel dieses Artikels ist mehrdeutig. Weitere Bedeutungen sind unter  aufgeführt.

Das für Öffentlichkeitsarbeit verwendete Logo der Landeshauptstadt Stuttgart

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Stuttgarter Schloßplatz am Morgen

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Arkadenhof im Alten Schloss

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Typisch hügeliges Stadtbild am Stuttgarter Talkessel: Blick auf die Karlshöhe

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Steps to reproduce

docling https://de.wikipedia.org/wiki/Stuttgart --image-export-mode embedded

Docling version

Docling version: 2.27.0
Docling Core version: 2.23.3
Docling IBM Models version: 3.4.1
Docling Parse version: 4.0.0
Python: cpython-313 (3.13.2)
Platform: Linux-6.12.19-1-lts-x86_64-with-glibc2.41

Python version

Python 3.13.2

@FaFre FaFre added the bug Something isn't working label Mar 19, 2025
@PeterStaar-IBM PeterStaar-IBM added the html issue related to html backend label Mar 20, 2025
@PeterStaar-IBM
Copy link
Contributor

@FaFre Yes, this is because the backend is not yet downloading the images (or registering the urls from the html). The error message here is somewhat misleading. We should,

  1. update the backend here:
  2. update the error-message in the markdown output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html issue related to html backend
Projects
None yet
Development

No branches or pull requests

4 participants