Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image trait wip #1

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Image trait wip #1

wants to merge 2 commits into from

Conversation

astraw
Copy link

@astraw astraw commented Feb 2, 2022

This is a very early draft made to gain visibility.

Copy link
Member

@xd009642 xd009642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments, I made at least one of these in the discord when the markdown was first published but good to have them here as well

sources. This trait be implemented by crates which provide their own data types
that could be used to hold image data.

### What's not so good
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing is there could (probably should) be convenience functions to access pixels at given x,y coordinates (or specific channel values (x,y,z). And something to allow for moving for moving windows such as https://docs.rs/ndarray/latest/ndarray/struct.ArrayBase.html#method.windows

These and the issues already mentioned in the "what's not so good" are the only things that would stop me from moving over the ndarray-vision algorithms to be implemented in terms of ImageStride

and `height()` methods return `u32` but should perhaps return `usize`. (The [`stride()`](https://docs.rs/machine-vision-formats/0.1.1/machine_vision_formats/trait.Stride.html#tymethod.stride)
method does return `usize`.)

## [`ndarray_image::NdImage` v0.3.0](https://docs.rs/ndarray-image/0.3.0/ndarray_image/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno if you want to mention it but there's always ndarray_vision::ImageBase<T, C> where C: ColorModel and T is an ndarray::Data. It simply wraps an ndarray::ArrayBase<T, Ix3> and uses Ix3 to make it easy to access different colour channels.

Then using type tags for the colour spaces ndarray-vision basically implements impl From<Rgb> for Hsv and then there's an auto-conversion so you can use From/Into to go from ImageBase<T, Rgb> -> ImageBase<T, Hsv> automatically.

Issue with that colour modelling is that it makes it hard to handle things like gamma, they could potentially be const generics when they're stable - but just how ergonomic is that API? It does let image processing algorithms specify what colour spaces they work on in the types i.e. impl Into<ImageBase<T, Gray>> which might be a plus? But I haven't quite implemented it like that currently preferring to do conversions inside the algorithms to make the API "friendlier".

Also compile time with generics may be a concern - but I think it's worth it personally for numeric stuff.

Just mentioning as it seems to be different enough to the existing things.

@vadixidav
Copy link
Member

vadixidav commented Feb 5, 2022

Overall, I would like to mention a few things:

  • I don't think this will let us write algorithms that abstract over GPU buffers, which is possibly okay, and perhaps this is just the CPU format of the image (which can then be copied to the GPU with a different abstraction).
  • My biggest gripe is that we may not just have one stride. We might have an image in YUV420 or similar format where we have entirely unique dimensions for each channel of the image. In that case we cannot say "each pixel is X bits/bytes". I would think that we should have N number of offsets, dimensions, and strides for both X and Y.

Let me elaborate on that last point a bit.

For instance, for RGB, I would expect something like the following:

struct Channel {
    offsets: [usize; 2],
    strides: [usize; 2],
    dimensions: [usize; 2],
}
fn channels(&self) -> &[Channel];

I would expect to get back:

&[Channel { offsets: [0, 0], strides: [3, width * 3], dimensions: [width, height] }, Channel { offsets: [1, 0], strides: [3, width * 3], dimensions: [width, height] }, Channel { offsets: [2, 0], strides: [3, width * 3], dimensions: [width, height] }]

Where for R the X offset is 0, G the X offset is 1, and B the X offset is 2, but they are otherwise the same. These offsets are in bytes, of course. We could ALSO associate more specific information with each channel. For instance, we could say "this channel is blue" or "this channel is luminosity" by using some enum that comes back in the Channel type. The color space information could be stored in the channel or it could be stored with the image entirely separately, but this is just one approach. This system lets us store even images in YUV420 or similar, since it employs a separate "ndarray slice" mechanism for each channel.

This also allows each channel to be stored separately in memory, so long as they are in the same buffer. Some algorithms might be more efficient when operating on images in this way.

@vadixidav vadixidav assigned vadixidav and unassigned vadixidav Feb 5, 2022
@vadixidav vadixidav self-requested a review February 5, 2022 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants