Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image trait wip #1

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions rfcs/0001-image-trait.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@

* RFC PR: [rust-cv/rfcs#N](https://github.com/rust-cv/rfcs/pull/N)
* Tracking Issue [rust-cv/rfcs#N](https://github.com/rust-cv/rfcs/issues/N)

# Summary

An `Image` trait to allow interoperability of image data.

# Motivation

Multiple existing crates implement concrete types that hold image data either
explicitly or implicitly. If each of these implemented a single, fundamental
trait that allowed specification of how to access image data across
implementations, image operations could be written generically that could
operate on any of the data types. This would enable, for example, image
processing routines to be written which would work equally well regardless of
how the data types is defined.

# Prior Art

## [`machine_vision_formats::ImageStride` v0.1.1](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.ImageStride.html)

Here we consider
[`machine_vision_formats::ImageStride`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.ImageStride.html)
together with its supertraits
[`machine_vision_formats::ImageData`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.ImageData.html)
and
[`machine_vision_formats::Stride`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.Stride.html).

### What's good

A simple mostly-trait only. Provides a "view" of existing structure and enum
definitions, allowing low-cost consumption of image data from a variety of
sources. This trait be implemented by crates which provide their own data types
that could be used to hold image data.

### What's not so good
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing is there could (probably should) be convenience functions to access pixels at given x,y coordinates (or specific channel values (x,y,z). And something to allow for moving for moving windows such as https://docs.rs/ndarray/latest/ndarray/struct.ArrayBase.html#method.windows

These and the issues already mentioned in the "what's not so good" are the only things that would stop me from moving over the ndarray-vision algorithms to be implemented in terms of ImageStride


No colorspace definition (e.g. sRGB vs linear RGB).

The version number will have to be bumped (although just unbreaking bumps) when new pixel formats are added.

Should there be a single fundamental trait instead of three? The present design
provides the possibility to implement `ImageData` for a type, and thus provide
access to the raw image data, without also implementing `Stride`. Does the
ability to allow access to raw image data and pixel format information, but
without providing information about stride,

(Minor nit) The static pixel format types are redundant with the dynamic types. The naming of the two could be improved.

### Where is this unidiomatic or unergonomic for Rust

The
[`width()`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.ImageData.html#tymethod.width)
and `height()` methods return `u32` but should perhaps return `usize`. (The [`stride()`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.Stride.html#tymethod.stride)
method does return `usize`.)

## [`ndarray_image::NdImage` v0.3.0](https://docs.rs/ndarray-image/0.3.0/ndarray_image/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno if you want to mention it but there's always ndarray_vision::ImageBase<T, C> where C: ColorModel and T is an ndarray::Data. It simply wraps an ndarray::ArrayBase<T, Ix3> and uses Ix3 to make it easy to access different colour channels.

Then using type tags for the colour spaces ndarray-vision basically implements impl From<Rgb> for Hsv and then there's an auto-conversion so you can use From/Into to go from ImageBase<T, Rgb> -> ImageBase<T, Hsv> automatically.

Issue with that colour modelling is that it makes it hard to handle things like gamma, they could potentially be const generics when they're stable - but just how ergonomic is that API? It does let image processing algorithms specify what colour spaces they work on in the types i.e. impl Into<ImageBase<T, Gray>> which might be a plus? But I haven't quite implemented it like that currently preferring to do conversions inside the algorithms to make the API "friendlier".

Also compile time with generics may be a concern - but I think it's worth it personally for numeric stuff.

Just mentioning as it seems to be different enough to the existing things.


### What's good

### What's not so good

A struct of its own, this would require implementations of the
[`From`](https://doc.rust-lang.org/nightly/std/convert/trait.From.html) trait
for each supporting library, potentially incurring conversion overhead rather
than merely providing a view of existing data.

### Where is this unidiomatic or unergonomic for Rust


## [`nshare v0.0.0`](https://docs.rs/nshare/0.8.0/nshare/)

### What's good

### What's not so good

Not specialized for images, provides no definitions for pixel format or
colorspace.

### Where is this unidiomatic or unergonomic for Rust


# Use case

- explicit image types (e.g.
[`image::DynamicImage`](https://docs.rs/image/0.23.14/image/enum.DynamicImage.html))
- implicit image types (e.g. when using
[`nalgebra::Matrix`](https://docs.rs/nalgebra/0.29.0/nalgebra/base/struct.Matrix.html)
to hold image data).

Use in `no_std` environments.

# Design


# Impact

We would want to define the base trait extremely carefully because (breaking)
version changes will cause widespread ripple it this trait because widely
implemented.

# Unresolved Questions

## Utility for specifying batched GPU operations

## Dynamic vs. static pixel formats

When possible, it is useful to specify pixel formats statically (at compile
time) so that image processing functions can be specialized to operate on, for
example, images of a specific format (e.g. Mono8 or RGB8). However, often one
wants to process images from a dynamic source such as a file. Therefore, it
seems desirable to allow specifying pixel formats both at compile time and
dynamically (at run time).

## Colorspaces

To be maximally useful, we need to define pixel format (e.g. YUV packed, YUV
planar, RGB packed, RGB planar, Mono8, Mono10 packed, Mono 10 unpacked, Bayer
patterns) and also the colorspace (e.g. that of BT.709, sRGB, "undefined" and
perhaps "custom").

## Strides (Packed vs Planar representations)

## Hyperspectral data.

## Volume images (e.g. confocal microscopy).