Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-level annotations #737

Open
benjelloun opened this issue Sep 11, 2024 · 2 comments
Open

Data-level annotations #737

benjelloun opened this issue Sep 11, 2024 · 2 comments
Assignees

Comments

@benjelloun
Copy link
Contributor

Add a mechanism to Croissant to define data-level annotations. Annotations are a general mechanism to attach additional information to other pieces of data. We plan to use annotations for a number of use cases, including:

  • statistics
  • labels (textual or otherwise)
  • provenance (including human annotator information)
  • ...
@benjelloun benjelloun converted this from a draft issue Sep 11, 2024
@benjelloun
Copy link
Contributor Author

Strawman proposal

Make annotation a first class property, so that we can clearly represent the fact that some contents of a RecordSet are annotations. You can think of an annotation as a special kind of field that annotates its container.

Here is an example of what a field-level annotation looks like:

{"@type": "cr:RecordSet", "@id": "images",
  "field": [
    { "@type": "cr:Field", "@id": "images/image", ... ,
      "annotation": {
        "@type": "cr:Field", "@id": "images/label", 
        "dataType": ["sc:Text", "cr:Label"]
      }
    }
  ]
}

In this example, the annotation "images/label" applies to the field "images/image".

Annotations can also appear at the level of a RecordSet. A RecordSet level annotation applies to the entire record. For example:

{
  "@type": "cr:RecordSet",
  "@id": "movies",
  "field": [
    { "@type": "cr:Field", "@id": "movies/movie_id", ...},
    { "@type": "cr:Field", "@id": "movies/title", ...},
    { "@type": "cr:Field", "@id": "movies/genre", ...}
  ],
  "annotation" : {
    "@type": "cr:Field", "@id": "movies/ratings", 
    subField: [
      { "@type": "cr:Field", "@id": "movies/ratings/user_id", ...}, 
      { "@type": "cr:Field", "@id": "movies/ratings/rating", ...}, 
    ]  
  }
}

In this example, ratings is a structured annotation that contains a user_id and a rating.

@omshinde
Copy link

Some examples of netcdf file for hierarchical data annotation -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants