Skip to content

Add fsspec support #443

@Nintorac

Description

@Nintorac

It looks like we can easily add fsspec support for paths with a few modifcations. This will let docling save to fsspec compatiable backends (S3, GCS, Azure, etc.) via https://github.com/fsspec/universal_pathlib which should increase doclings versatility in cloud deployments.

Currently things like save_as save_as_json, save_as_yaml, save_as_markdown, save_as_html and their load_from_* ccounterparts only work with local filesystem paths. Something like:

Before

from upath import UPath

s3_path = UPath("s3://my-bucket/doc.json")
doc.save_as_json(s3_path)  # TypeError: expected str, bytes or os.PathLike object, not S3Path

After

from upath import UPath

s3_path = UPath("s3://my-bucket/doc.json")
doc.save_as_json(s3_path)  # Works!
doc.save_as_json(s3_path, artifacts_dir=UPath("s3://my-bucket/images"),
image_mode=ImageRefMode.REFERENCED)  # Images saved to S3 too!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions