Skip to content

Adding Custom Vision object detection sample #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions samples/csharp/end-to-end-apps/StopSignDetection_ONNX/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Object Detection - ASP.NET Core Web & WPF Desktop Sample

| ML.NET version | API type | Status | App Type | Data type | Scenario | ML Task | Algorithms |
|----------------|-------------|------------|-------------|-------------|------------------|---------------|-----------------------------------|
| v1.7.1 | Dynamic API | Up-to-date | End-End app | image files | Object detection | Deep Learning | ONNX: Custom Vision |

## Problem

Object detection is one of the main applicatinos of deep learning by being able to not only classify part of an image, but also show where in the image the object is with a bounding box. For deep learning scenarios, you can either use a pre-trained model or train your own model. This sample uses an object detection model exported from [Custom Vision](https://www.customvision.ai).

## How the sample works

This sample consists of a single console application that builds an ML.NET pipeline from an ONNX model downnloaded from Custom Vision and predicts as well as shows the bounding box on any images in the "test" folder.

## ONNX

The Open Neural Network eXchange i.e [ONNX](http://onnx.ai/) is an open format to represent deep learning models. With ONNX, developers can move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners, including Microsoft.

## Model input and output

In order to parse the prediction output of the ONNX model, we need to understand the format (or shape) of the input and output tensors. To do this, we'll start by using [Netron](https://netron.app/), a GUI visualizer for neural networks and machine learning models, to inspect the model.

Below is an example of what we'd see upon opening this sample's model with Netron:

![Output from inspecting the model with Netron](./assets/onnx-input.jpg)

From the output above, we can see the ONNX model has the following input/output formats:

### Input: 'image_tensor' 3x320x320

The first thing to notice is that the **input tensor's name** is **'image_tensor'**. We'll need this name later when we define **input** parameter of the estimation pipeline.

We can also see that the or **shape of the input tensor** is **3x320x320**. This tells that the image passed into the model should be 320 high x 320 wide. The '3' indicates the image(s) should be in BGR format; the first 3 'channels' are blue, green, and red, respectively.

### Output

We can see that the ONNX model has three outputs:
- **detected_classes**: An array of indexes that corresponds to the **labels.txt** file of what classes have been detected in the image. The labels are the tags that are added when uploading images to the Custom Vision service.
- **detected_boxes**: An array of floats that are normalized to the input image. There will be a set of four items in the array for each bounding box.
- **detected_scores**: An array of scores for each detected class.

## Solution

## Code Walkthrough

Create a class that defines the data schema to use while loading data into an `IDataView`. ML.NET supports the `Bitmap` type for images, so we'll specify `Bitmap` property decorated with the `ImageTypeAttribute` and pass in the height and width dimensions we got by [inspecting the model](#model-input-and-output), as shown below.

```csharp
public class StopSignInput
{
public struct ImageSettings
{
public const int imageHeight = 320;
public const int imageWidth = 320;
}

public class StopSignInput
{
[ImageType(ImageSettings.imageHeight, ImageSettings.imageWidth)]
public Bitmap Image { get; set; }
}
}
```

### ML.NET: Configure the model

The first step is to create an empty `DataView` to obtain the schema of the data to use when configuring the model.

```csharp
var data = _mlContext.Data.LoadFromEnumerable(new List<StopSignInput>());
```

Next, we can use the input and output tensor names we got by [inspecting the model](#model-input-and-output) to define the **input** and **output** parameters of the ONNX Model. We can use this information to define the estimator pipeline. Usually, when dealing with deep neural networks, you must adapt the images to the format expected by the network. For this reason, the code below resizes and transforms the images (pixel values are normalized across all R,G,B channels). Since we have multiple outputs in our model, we can use the overload in **ApplyOnnxModel** to define a string array of output column names.

```csharp
var pipeline = context.Transforms.ResizeImages(resizing: ImageResizingEstimator.ResizingKind.Fill, outputColumnName: "image_tensor", imageWidth: ImageSettings.imageWidth, imageHeight: ImageSettings.imageHeight, inputColumnName: nameof(StopSignInput.Image))
.Append(context.Transforms.ExtractPixels(outputColumnName: "image_tensor"))
.Append(context.Transforms.ApplyOnnxModel(outputColumnNames: new string[] { "detected_boxes", "detected_scores", "detected_classes" },
inputColumnNames: new string[] { "image_tensor" }, modelFile: "./Model/model.onnx"));
```

Last, create the model by fitting the `DataView`.

```csharp
var model = pipeline.Fit(data);
```

## Create a PredictionEngine

After the model is configured, create a `PredictionEngine`, and then pass the image to the engine to classify images using the model.

```csharp
var predictionEngine = context.Model.CreatePredictionEngine<StopSignInput, StopSignPrediction>(model);
```

## Detect objects in an image

When obtaining the prediction from images in the `test` directory, we get a `long` array in the `PredictedLabels` property, a `float` array in the `BoundingBoxes` property, and a `float` array in the `Scores` property. For each test image load it into a `FileStream` and parse it into a `Bitmap` object, then we use the `Bitmap` object to send into our input to make a prediction.

We use the `Chunk` method to determine how many bounding boxes were predicted and use that to draw the bounding boxes on the image. To get the labels, we use the `labels.txt` file and use the `PredictedLabels` property to look up the label.

```csharp
var labels = File.ReadAllLines("./model/labels.txt");

var testFiles = Directory.GetFiles("./test");

Bitmap testImage;

foreach (var image in testFiles)
{
using (var stream = new FileStream(image, FileMode.Open))
{
testImage = (Bitmap)Image.FromStream(stream);
}

var prediction = predictionEngine.Predict(new StopSignInput { Image = testImage });

var boundingBoxes = prediction.BoundingBoxes.Chunk(prediction.BoundingBoxes.Count() / prediction.PredictedLabels.Count());

var originalWidth = testImage.Width;
var originalHeight = testImage.Height;

for (int i = 0; i < boundingBoxes.Count(); i++)
{
var boundingBox = boundingBoxes.ElementAt(i);

var left = boundingBox[0] * originalWidth;
var top = boundingBox[1] * originalHeight;
var right = boundingBox[2] * originalWidth;
var bottom = boundingBox[3] * originalHeight;

var x = left;
var y = top;
var width = Math.Abs(right - left);
var height = Math.Abs(top - bottom);

var label = labels[prediction.PredictedLabels[i]];

using var graphics = Graphics.FromImage(testImage);

graphics.DrawRectangle(new Pen(Color.Red, 3), x, y, width, height);
graphics.DrawString(label, new Font(FontFamily.Families[0], 32f), Brushes.Red, x + 5, y + 5);
}

if (File.Exists(predictedImage))
{
File.Delete(predictedImage);
}

testImage.Save(predictedImage);
}
```

## Output

For this object detection scenario, we will output a new photo where the bounding boxes and label are drawn onto it. If one already exists when running the console application, it will delete it and save a new photo.

![Multiple bounding boxes output](./assets/object-detection-output.jpg)
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 17
VisualStudioVersion = 17.1.32228.430
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StopSignDetection_ONNX", "StopSignDetection_ONNX\StopSignDetection_ONNX.csproj", "{37A33ADD-47A7-4B09-B323-CB9BCBC86851}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{37A33ADD-47A7-4B09-B323-CB9BCBC86851}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{37A33ADD-47A7-4B09-B323-CB9BCBC86851}.Debug|Any CPU.Build.0 = Debug|Any CPU
{37A33ADD-47A7-4B09-B323-CB9BCBC86851}.Release|Any CPU.ActiveCfg = Release|Any CPU
{37A33ADD-47A7-4B09-B323-CB9BCBC86851}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {0FCF9329-4869-4595-94F9-56E4055DA8D4}
EndGlobalSection
EndGlobal
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
stop-sign
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
using Microsoft.ML;
using Microsoft.ML.Transforms.Image;
using StopSignDetection_ONNX;
using System.Drawing;

var context = new MLContext();

var data = context.Data.LoadFromEnumerable(new List<StopSignInput>());
var root = new FileInfo(typeof(Program).Assembly.Location);
var assemblyFolderPath = root.Directory.FullName;

// Create pipeline
var pipeline = context.Transforms.ResizeImages(resizing: ImageResizingEstimator.ResizingKind.Fill, outputColumnName: "image_tensor", imageWidth: ImageSettings.imageWidth, imageHeight: ImageSettings.imageHeight, inputColumnName: nameof(StopSignInput.Image))
.Append(context.Transforms.ExtractPixels(outputColumnName: "image_tensor"))
.Append(context.Transforms.ApplyOnnxModel(outputColumnNames: new string[] { "detected_boxes", "detected_scores", "detected_classes" },
inputColumnNames: new string[] { "image_tensor" }, modelFile: "./Model/model.onnx"));

// Fit and create prediction engine
var model = pipeline.Fit(data);

var predictionEngine = context.Model.CreatePredictionEngine<StopSignInput, StopSignPrediction>(model);

var labels = File.ReadAllLines("./Model/labels.txt");

var testFiles = Directory.GetFiles("./test");

Bitmap testImage;

foreach (var image in testFiles)
{
// Load test image into memory
var predictedImage = $"{Path.GetFileName(image)}-predicted.jpg";

using (var stream = new FileStream(image, FileMode.Open))
{
testImage = (Bitmap)Image.FromStream(stream);
}

// Predict on test image
var prediction = predictionEngine.Predict(new StopSignInput { Image = testImage });

// Calculate how many sets of bounding boxes we get from the prediction
var boundingBoxes = prediction.BoundingBoxes.Chunk(prediction.BoundingBoxes.Count() / prediction.PredictedLabels.Count());

var originalWidth = testImage.Width;
var originalHeight = testImage.Height;

// Draw boxes and predicted label
for (int i = 0; i < boundingBoxes.Count(); i++)
{
var boundingBox = boundingBoxes.ElementAt(i);

var left = boundingBox[0] * originalWidth;
var top = boundingBox[1] * originalHeight;
var right = boundingBox[2] * originalWidth;
var bottom = boundingBox[3] * originalHeight;

var x = left;
var y = top;
var width = Math.Abs(right - left);
var height = Math.Abs(top - bottom);

// Get predicted label from labels file
var label = labels[prediction.PredictedLabels[i]];

// Draw bounding box and add label to image
using var graphics = Graphics.FromImage(testImage);

graphics.DrawRectangle(new Pen(Color.NavajoWhite, 8), x, y, width, height);
graphics.DrawString(label, new Font(FontFamily.Families[0], 18f), Brushes.NavajoWhite, x + 5, y + 5);
}

// Save the prediction image, but delete it if it already exists before saving
if (File.Exists(predictedImage))
{
File.Delete(predictedImage);
}

testImage.Save(Path.Combine(assemblyFolderPath, predictedImage));
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

<ItemGroup>
<None Remove="Model\a889f48fdc1c45b5af840c5df4210a04.ONNX.zip" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="Microsoft.ML" Version="1.7.1" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.7.1" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.7.1" />
</ItemGroup>

<ItemGroup>
<None Update="Model\labels.txt">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="Model\model.onnx">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="test\stop-sign-multiple-test.jpg">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
<None Update="test\stop-sign-test.jpg">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
using Microsoft.ML.Transforms.Image;
using System.Drawing;

namespace StopSignDetection_ONNX
{
public struct ImageSettings
{
public const int imageHeight = 320;
public const int imageWidth = 320;
}

public class StopSignInput
{
[ImageType(ImageSettings.imageHeight, ImageSettings.imageWidth)]
public Bitmap Image { get; set; }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
using Microsoft.ML.Data;

namespace StopSignDetection_ONNX
{
public class StopSignPrediction
{
[ColumnName("detected_classes")]
public long[] PredictedLabels { get; set; }

[ColumnName("detected_boxes")]
public float[] BoundingBoxes { get; set; }

[ColumnName("detected_scores")]
public float[] Scores { get; set; }
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.