Images not available in markdown from URL #1205

FaFre · 2025-03-19T21:32:48Z

Bug

It seems like it is not possible to capture images from URL with any image-export-mode. The error assumes the usage of the PDF pipeline that is not the case, because the input is HTML.

# Stuttgart

aus Wikipedia, der freien Enzyklopädie

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Der Titel dieses Artikels ist mehrdeutig. Weitere Bedeutungen sind unter  aufgeführt.

Das für Öffentlichkeitsarbeit verwendete Logo der Landeshauptstadt Stuttgart

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Stuttgarter Schloßplatz am Morgen

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Arkadenhof im Alten Schloss

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Typisch hügeliges Stadtbild am Stuttgarter Talkessel: Blick auf die Karlshöhe

<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->

Steps to reproduce

docling https://de.wikipedia.org/wiki/Stuttgart --image-export-mode embedded

Docling version

Docling version: 2.27.0
Docling Core version: 2.23.3
Docling IBM Models version: 3.4.1
Docling Parse version: 4.0.0
Python: cpython-313 (3.13.2)
Platform: Linux-6.12.19-1-lts-x86_64-with-glibc2.41

Python version

Python 3.13.2

The text was updated successfully, but these errors were encountered:

PeterStaar-IBM · 2025-03-20T03:42:19Z

@FaFre Yes, this is because the backend is not yet downloading the images (or registering the urls from the html). The error message here is somewhat misleading. We should,

update the backend here:
- https://github.com/docling-project/docling/blob/main/docling/backend/html_backend.py#L509
- https://github.com/docling-project/docling/blob/main/docling/backend/html_backend.py#L538
update the error-message in the markdown output

FaFre added the bug Something isn't working label Mar 19, 2025

PeterStaar-IBM added the html issue related to html backend label Mar 20, 2025

PeterStaar-IBM assigned cau-git and ceberam Mar 20, 2025

ceberam mentioned this issue Apr 1, 2025

Image Placeholder lost when converting html #1278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images not available in markdown from URL #1205

Images not available in markdown from URL #1205

FaFre commented Mar 19, 2025

PeterStaar-IBM commented Mar 20, 2025

Images not available in markdown from URL #1205

Images not available in markdown from URL #1205

Comments

FaFre commented Mar 19, 2025

Bug

Steps to reproduce

Docling version

Python version

PeterStaar-IBM commented Mar 20, 2025