Tutorial on active learning with Nvidia Train, Adapt, and Optimize (TAO).
Disclaimer: In order to run Nvidia TAO, you need to setup Nvidia NGC and TAO. If you don't have this already, we will walk you through it in section 1.2.
In this tutorial, we will show you how you can do active learning for object detection with Nvidia TAO. The task will be object detection of apples in a plantation setting. Accurately detecting and counting fruits is a critical step towards automating harvesting processes. Furthermore, fruit counting can be used to project expected yield and hence to detect low yield years early on.
The goal of the tutorial is to train an object detection model to good accuracy with as little labeling as possible. For this, we do active learning with the Lightly Worker and use the Nvidia TAO framework as it's optimized for fast transfer learning on small datasets.
The structure of the tutorial is as follows:
To get started, clone this repository to your machine and change the directory.
git clone https://github.com/lightly-ai/NvidiaTAOActiveLearning.git
cd NvidiaTAOActiveLearning
For this tutorial, you require python >= 3.6.9
and python <= 3.9
.
To set up lightly
for active learning, head to the Lightly Platform and create a free account by logging in. Make sure to get your token by clicking on your e-mail address and selecting "Preferences". You will need the token for the rest of this tutorial so let's store it in an environment variable:
export LIGHTLY_TOKEN="YOUR_TOKEN"
Then, install the Lightly API Client and pull the latest Lightly Worker Docker image:
pip3 install lightly
docker pull lightly/worker:latest
For a full set of instructions, check out the docs. Finally, register the Lightly Worker in the Lightly Platform by running the following script (note that the Lightly token is fetched from the environment variable):
python3 register_worker.py
Store the worker id from the output in an environment variable:
export LIGHTLY_WORKER_ID="YOUR_WORKER_ID"
To install Nvidia TAO, follow these instructions. If you want to use custom scripts for training and inference, you can skip this part.
Setting up Nvidia TAO can be done in a few minutes and consists of the following steps:
- Install Docker.
- Install Nvidia GPU driver v455.xx or above.
- Install nvidia docker2.
- Get an NGC account and API key. The API key can be found in the settings (top right), under "Setup" and "Generate API Key".
- Install the
ngc
command-line tool - Install the Nvidia TAO launcher
Check your installation is correct:
which ngc
which tao
Make sure to keep the Nvidia API key in a safe location as we're going to need it later:
export NVIDIA_API_KEY="YOUR_NVIDIA_API_KEY"
To make all relevant directories accessible to Nvidia TAO, you need to mount the current working directory and the yolo_v4/specs
directory to the Nvidia TAO docker. You can do so with the provided mount.py
script.
python3 mount.py
Next, you need to specify all training configurations. Nvidia TAO expects all training configurations in a .txt
file which is stored in the yolo_v4/specs/
directory. For the purpose of this tutorial we provide an example in yolo_v4_minneapple.txt
. The most important differences to the example script provided by Nvidia are:
- Anchor Shapes: We made the anchor boxes smaller since the largest bounding boxes in our dataset are only approximately 50 pixels wide.
- Augmentation Config: We set the output width and height of the augmentations to 704 and 1280 respectively. This corresponds to the shape of our images.
- Target Class Mapping: For transfer learning, we made a target class mapping from
car
toapple
. This means that every time the model would now predict a car, it predicts an apple instead.
In this tutorial, we will use the MinneApple fruit detection dataset. It consists of 670 training images of apple trees, annotated for detection and segmentation. The dataset contains images of trees with red and green apples.
Note: Nvidia TAO expects the data and labels in the KITTI format. This means they expect one folder containing the images and one folder containing the annotations. The name of an image and its corresponding annotation file must be the same apart from the file extension. You can find the MinneApple dataset converted to this format attached to the first release of this tutorial. Alternatively, you can download the files from the official link and convert the labels yourself.
Create a data/
directory, move the downloaded minneapple.zip
file there, and unzip it
mkdir data
cd data
wget "https://github.com/lightly-ai/NvidiaTLTActiveLearning/releases/download/v1.0-alpha/minneapple.zip"
unzip minneapple.zip
rm minneapple.zip
cd ..
tree -d data/
The output of the tree
command should be:
data/
└── raw
├── images
└── labels
Here's an example of how the converted labels look like. Note how we use the label car
instead of apple
because of the target class mapping we had defined in section 1.2.
Car 0. 0 0. 1.0 228.0 6.0 241.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 5.0 228.0 28.0 249.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 30.0 238.0 46.0 256.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 37.0 214.0 58.0 234.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 82.0 261.0 104.0 281.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 65.0 283.0 82.0 301.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 82.0 284.0 116.0 317.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 111.0 274.0 142.0 306.0 0. 0. 0. 0. 0. 0. 0.
Car 0. 0 0. 113.0 308.0 131.0 331.0 0. 0. 0. 0. 0. 0. 0.
In order for Lightly to be able to access the images, they need to be stored in a cloud storage. For the purposes of this tutorial, we'll use
S3. Create a new S3 bucket with a directory minneapple
.
Next, make sure you have the AWS CLI tool installed:
aws --version
Sync the raw data:
aws s3 sync data/ s3://YOUR_BUCKET_HERE/minneapple
Next, you need to create a place where Lightly can store outputs and read predictions from. Create a new directory called minneapple_out
in your S3 bucket. Then, run the following commands
mkdir infer_labels
python3 tao_to_lightly.py --input_dir infer_labels
aws s3 sync .lightly/ s3://YOUR_BUCKET_HERE/minneapple_out/.lightly
The tao_to_lightly.py
file performs the followings steps:
- It lists all predictions in the directory
infer_labels
(currently none!). - If there are predictions, it converts them to the Lightly format and stores them in
.lightly/predictions
. - It creates the required
tasks.json
andschema.json
files under.lightly/predictions
(visit the Lightly documentation for more information).
If you want to write your own script, you can take ours as a reference. Note that we will run this exact script again once we have our first set of predictions.
The output directory in your S3 bucket should have the following structure now:
minneapple_out/
└── .lightly/
└── predictions
├── minneapple
│ ├── raw
│ │ └── images
│ └── schema.json
└── tasks.json
What you just did is prepare the output directory to be filled with predictions when you start doing active learning.
Now, all that's left is to create credentials such that Lightly can access the data. For S3 buckets, we recommend to use delegated access. Follow the instructions here to set up list
and read
permissions for the input folder and list
, read
, write
and delete
permissions for the output folder. Store the credentials in environment variables:
export S3_REGION="YOUR_S3_REGION"
export S3_INPUT_PATH="s3://YOUR_BUCKET_HERE/minneapple"
export S3_INPUT_ROLE_ARN="YOUR_INPUT_ROLE_ARN"
export S3_INPUT_EXTERNAL_ID="YOUR_INPUT_EXTERNAL_ID"
export S3_LIGHTLY_PATH="s3://YOUR_BUCKET_HERE/minneapple_out"
export S3_LIGHTLY_ROLE_ARN="YOUR_LIGHTLY_ROLE_ARN"
export S3_LIGHTLY_EXTERNAL_ID="YOUR_LIGHTLY_EXTERNAL_ID"
Congrats! You're ready to start doing active learning with Lightly and Nvidia TAO.
Now that the setup is complete, you can start the active learning loop. In general, the active learning loop will consist of the following steps:
- Initial selection: Get an initial set of images to annotate and train on.
- Training and inference: Train on the labeled data and make predictions on all data.
- Active learning query: Use the predictions to get the next set of images to annotate, go to 2.
We will walk you through all three steps in this tutorial.
To do the initial selection, you first need to start up the Lightly Worker. Open up a new terminal, and run the following commands:
export LIGHTLY_TOKEN="YOUR_TOKEN"
export LIGHTLY_WORKER_ID="YOUR_WORKER_ID"
docker run --shm-size="1024m" --gpus all --rm -it \
-e LIGHTLY_TOKEN=$LIGHTLY_TOKEN \
lightly/worker:latest \
worker.worker_id=$LIGHTLY_WORKER_ID
To schedule a selection job, switch to your first terminal and run
python3 schedule.py \
--dataset-name minneapple \
--s3-region $S3_REGION \
--s3-input-path $S3_INPUT_PATH \
--s3-input-role-arn $S3_INPUT_ROLE_ARN \
--s3-input-external-id $S3_INPUT_EXTERNAL_ID \
--s3-lightly-path $S3_LIGHTLY_PATH \
--s3-lightly-role-arn $S3_LIGHTLY_ROLE_ARN \
--s3-lightly-external-id $S3_LIGHTLY_EXTERNAL_ID
The above script roughly performs the following steps:
- It creates a new dataset in the Lightly Platform named after the
--dataset-name
. - If a dataset with the same name already exists, it chooses that one.
- It schedules a job to select images based on diversity and prediction uncertainty if predictions exist.
You can use it as a reference to write your own script for scheduling jobs.
The job should be picked up and processed by the Lightly Worker after a few seconds. Once the upload has finished, you can visually explore your dataset in the Lightly Platform.
Before training your machine learning model you first need to annotate the selected images. In the real world, you can use one of Lightly's export features to label the images. Here, you can simply simulate this by running
python3 annotate.py \
--dataset-name minneapple \
--input-dir data/
The above script copies images and labels from data/raw
to data/train
. In real life, you would have to do the labeling or outsource it:
- Export the images and load them into an annotation tool.
- Annotate the images in the annotation tool.
- Export the labels in the Kitti format (expected by TAO).
- Add the annotated images to
data/train/images
and the labels todata/train/labels
.
You can verify that the number of annotated images is correct like this:
ls data/train/images | wc -l
ls data/train/labels | wc -l
The expected output is:
100
100
Now that you have your annotated training data, let's train an object detection model on it and see how well it works! Use Nvidia TAO to train a YOLOv4 object detector from the command line. The cool thing about transfer learning is that you don't have to train a model from scratch and therefore require fewer annotated images to get good results.
Start by downloading a pre-trained object detection model from the Nvidia registry.
mkdir -p ./yolo_v4/pretrained_resnet18
ngc registry model download-version nvidia/tao/pretrained_object_detection:resnet18 \
--dest yolo_v4/pretrained_resnet18/
Finetuning the object detector on the sampled training data is as simple as the following command.
If you get an out-of-memory-error you can change the size of the input images and the batch size in the
yolo_v4/specs/yolo_v4_minneapple.txt
file. Changeoutput_width
/output_height
orbatch_size_per_gpu
respectively.
mkdir -p $PWD/yolo_v4/experiment_dir_unpruned
tao yolo_v4 train \
-e /workspace/tao-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
-r /workspace/tao-experiments/yolo_v4/experiment_dir_unpruned \
--gpus 1 \
-k $NVIDIA_API_KEY
After 50 epochs, the mAP should be around 0.45
:
Epoch 50/50
25/25 [==============================] - 19s 779ms/step - loss: 3211.5103
Producing predictions: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:19<00:00, 1.48s/it]
Start to calculate AP for each class
*******************************
apple AP 0.47147
mAP 0.47147
*******************************
Now that you have finetuned the object detector on your dataset, you can do inference to see how well it works.
Doing inference on the whole dataset has the advantage that you can easily figure out for which images the model performs poorly or has a lot of uncertainties.
tao yolo_v4 inference \
-i /workspace/tao-experiments/data/raw/images \
-e /workspace/tao-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
-o /workspace/tao-experiments/infer_images \
-l /workspace/tao-experiments/infer_labels \
-m /workspace/tao-experiments/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_050.tlt \
--gpus 1 \
-k $NVIDIA_API_KEY
Below you can see two example images after training. It's evident that the model does not perform well on the unlabeled image. Therefore, it makes sense to add more samples to the training dataset.
You can use the inferences from the previous step to determine which images cause the model problems. With Lightly, you can easily select these images while at the same time making sure that your training dataset is not flooded with duplicates.
First, convert the predictions from the Kitti format to the Lightly prediction format. You can use the following script for this:
python3 tao_to_lightly.py --input_dir infer_labels
Then, the predictions need to be synced to the S3 bucket such that the Lightly Worker can access them:
aws sync .lightly/ $S3_LIGHTLY_PATH/.lightly/
Now, you can simply run the same schedule.py
and annotate.py
commands as above again to add more images to the dataset:
python3 schedule.py \
--dataset-name minneapple \
--s3-region $S3_REGION \
--s3-input-path $S3_INPUT_PATH \
--s3-input-role-arn $S3_INPUT_ROLE_ARN \
--s3-input-external-id $S3_INPUT_EXTERNAL_ID \
--s3-lightly-path $S3_LIGHTLY_PATH \
--s3-lightly-role-arn $S3_LIGHTLY_ROLE_ARN \
--s3-lightly-external-id $S3_LIGHTLY_EXTERNAL_ID
python3 annotate.py \
--dataset-name minneapple \
--input-dir data/
As before, we can check the number of images in the training set:
ls data/train/images | wc -l
ls data/train/labels | wc -l
The expected output is:
200
200
You can re-train our object detector on the new dataset to get an even better model. For this, you can use the same command as before. If you want to continue training from the last checkpoint, make sure to replace the pretrain_model_path
in the specs file by a resume_model_path
.
tao yolo_v4 train \
-e /workspace/tao-experiments/yolo_v4/specs/yolo_v4_minneapple.txt \
-r /workspace/tao-experiments/yolo_v4/experiment_dir_unpruned \
--gpus 1 \
-k $NVIDIA_API_KEY
If you're still unhappy with the performance after re-training the model, you can repeat steps 2.2 and 2.3 and then re-train the model again.