Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SODA-A dataset #2575

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Add SODA-A dataset #2575

wants to merge 10 commits into from

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Feb 10, 2025

This PR adds the SODA-A dataset. Dataset rehosted on HF.

Dataset features:

* 2513 images
* 872,069 annotations with oriented bounding boxes
* 9 classes

Dataset format:

* Images are three channel .jpg files.
* Annotations are in json files

TODOS:

  • some annotations have more than just 8 coordinates, for example file Annotations/train/01874.json:
{
        {
            "poly": [
                1248.874059994201,
                2785.0,
                1267.2565270181412,
                2785.0,
                1290.0001220703125,
                2783.999755859375,
                1289.2877197265625,
                2767.801025390625,
                1248.1971435546875,
                2769.608154296875
            ],
            "area": 659.4863047840048,
            "category_id": 7,
            "image_id": 1874,
            "id": 43
        },

Example plot:
soda

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Feb 10, 2025
@nilsleh nilsleh added this to the 0.7.0 milestone Feb 10, 2025
@nilsleh nilsleh marked this pull request as draft February 10, 2025 21:02
@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 10, 2025

@shaunyuan22 Thank you for this nice dataset and all the work. We aim to make the dataset more easily usable in torchgeo, and would appreciate it if you have any comments, corrections etc.


if self.bbox_orientation == 'oriented':
# TODO different keys for oriented and horizontal boxes
sample['boxes'] = boxes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjstewart what should we do for oriented bounding boxes, Kornia only has boxes_xyxy and boxes_xywh.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kornia box keys are bbox_xyxy and bbox_xywh

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 12, 2025

Open question how to deal with the polygons into a common oriented bounding box schema.

@nilsleh nilsleh marked this pull request as ready for review February 12, 2025 07:43
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked about oriented bboxes on Slack. We may need to add support to Kornia for this ourselves. Until Kornia supports it natively, I guess it doesn't matter what the format looks like. But let's use the same key names that Kornia uses.

@@ -49,6 +49,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
`SKIPP'D`_,R,"Fish-eye","CC-BY-4.0","363,375",-,64x64,-,RGB
`SkyScript`_,IC,"NAIP, orthophotos, Planet SkySat, Sentinel-2, Landsat 8--9",MIT,5.2M,-,100--1000,0.1--30,RGB
`So2Sat`_,C,Sentinel-1/2,"CC-BY-4.0","400,673",17,32x32,10,"SAR, MSI"
`SODAA`_,OD,Aerial,"CC-BY-SA","2513",9,"varying","RGB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`SODAA`_,OD,Aerial,"CC-BY-SA","2513",9,"varying","RGB"
`SODA`_,OD,Aerial,"CC-BY-SA","2513",9,"varying","RGB"

I see CC-BY-SA here, but I wish we knew which CC-BY-SA version. I usually use a size range instead of "varying".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaunyuan22 do you know?

P.S. We are adding TorchGeo data loaders for your excellent SODA-A dataset, hopefully this makes it even easier for people to use and cite your paper!

If you use this dataset in your research, please cite the following paper:

* https://ieeexplore.ieee.org/document/10168277

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that pyarrow is required? Is this a requirement from the dataset authors or from you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add extra dependencies unless they are absolutely required. Does it add a significant speed boost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, Ill change it to csv, parquet is more performant so today's standard, but that will remove the dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants