Blog Posts French Process Management

Starting from scratch, how to embed computer vision techniques into your project #3

Blog: Smile - Le blog des consultants

Part 3 — Model training

Open your book at chapter 1: what is a screwdriver?

We have now nearly reached the training phase. To start this phase, the only missing piece is a machine learning environment.

Each machine learning model comes with its logic to run a training and to use a trained model for inference. So to run the training, we have to refer to the model documentation to understand :

Also, model training will require a lot of resources :

The training environment’s purpose is to combine and manage all these data, software, and hardware resources for the training.

A training environment can be complex to set up, maintain and scale. This is why we need abstraction :

It is where AWS SageMaker and its SDK will help us. SageMaker is a one-stop-shop for AI projects, covering all steps from annotation, training, and trained model endpoint for inference invocation.

An exciting aspect of SageMaker is the ability to either use built-in models or to bring your model.

This is exactly what we want to do in this test: we want to train a custom model (Yolo v5) and deploy this trained model as a service endpoint to integrate into our legacy software.

To bring our model into SageMaker, we have to use an abstraction layer that comes with the SageMaker SDK and cloud services :

We will detail now this SageMaker training process with our data set and Yolo v5.

Preparing the data

We have a data set available as a GroundTruth output, and we also know that this data set contains a lot more screwdrivers (4x) than other objects.

To start the model training, we first need to adapt this data set format to something usable by Yolo and compatible with the SageMaker environment. We also need to shape a subset of this data set with an equivalent quantity of each object.

Important note for anybody who wants to investigate AI :

– Python skills for AI are mandatory

– For this project : you will need a computer with Python 3, Conda and/or Pip and Jupyter lab

And all developed script/notebook are available in this repository :

GitHub – smileinnovation/visual-search-yolov5-sagemaker

GroundTruth data set metadata are stored in one file (output.manifest ) using the JSON line format: one line, per image file, is a JSON object containing all objects label information (position, label id)

Yolo v5, for training, is expecting to parameters :

On top of that, we need to filter the original data set to get a more balanced number of labels per object.

To produce this new data set, we developed some scripts available as a notebook ( 1-dataprep/ground-truth-to-panda:

Training time

We have all the required data (images, labels) correctly; we can now start the model training.

Again here, we will use a notebook (2-train/training-job) to script the required action for this training.

As already discussed, we want to leverage transfer learning. This means we need a pre-trained model as an input for this domain-specific training. We will also need the Yolo v5 algorithm to run the training, as we will use its specific code to compute the new model.

From the Yolo v5 repository, we can find a pre-trained model using the Coco data set. We will use it during this process.

We also reviewed that the SageMaker abstraction is based on docker images when bringing our model. So we will package the Yolo algorithm with all its dependencies inside a docker image to run the training.

So in this training notebook, we implemented the following steps :

In steps 3 and 5, several settings are related to the data set access (images and labels) and also to the training job behavior:

Important note for Step 6:

Using a “local” instance, you can run the training locally on your computer running the jupyter lab : this is very useful for debug purpose, and not wasting time/money on AWS when fixing scripts/docker images issues

The process is relatively straightforward, but some key parameters require some explanation: how long a training must last to get something efficient? When should we stop it?

To answer this question, we need to understand some neural network principles (especially backpropagation) :

So to know when to stop training, we need to monitor these two indicators following a few rules :

I will stop the explanation here and let you read these articles that explain all these theoretical principles in detail.

During this project, we will also follow the best practices described by the Yolo team.

Tips for Best Training Results · ultralytics/yolov5 Wiki

Following these best practices, we trained our model for 50 epochs. This training lasts 4 hours and 30 minutes, with an “ml.p3.2xlarge” SageMaker instance (entry model with one dedicated 16Gb GPU).

GPU is important for training as they accelerate the computation time drastically. Their memory size (GPU memory) is also essential to fit the training data of each “epoch”: the more memory you have, the more you optimize the computation of each “epoch.”

And here to the “loss” monitoring of this training :

50 epochs training results

We can observe that the validation loss follows the training loss, and both are going down quickly until about 30 epochs and then stabilize. The validation loss trend seems to become flat and not to pass over the training loss.

So far, we are good at 50 epochs, but we could have tried to run the training longer as the validation loss function seems to continue down.

We decided to stop the training at 50 epochs, as again, we are not trying to get fully optimized right now. We want to test the entire process.

At the end of the training, SageMaker saves the Yolo training outcome in the “results” folder of the training inputs. This outcome is saved as a tarball archive and contains the trained model and other information about the training process.

“I can see tools.”

Our trained model is now ready. We can now test it in real life.

This model can be used anywhere the Yolo algorithm can run. This means you can run it on-premise or in SageMaker. Again a good abstraction layer is a docker image.

For our use case, we want to deploy an API that the ElasticSuite product will use.

As a first step, we want to rely on a SageMaker endpoint. This endpoint is an AWS-managed server that will run the docker image. The lifecycle of this endpoint is controlled and monitored by the SageMaker service.

When this endpoint is deployed, we can use the SageMaker SDK to send images and get the result in a response.

To do so, again, we will use a notebook to setup this endpoint (3-predict/deploy) :

The instance type used for inference requires a small amount of resources compared to training, GPU is not needed, and response time has to be tested to adjust the resources to your need. SageMaker also allows the setup of auto-scaling rules for the endpoint.

Important note for Step 4 :

Using a “local” instance, you can run the inference locally on your computer. This is helpful for debug purpose, and a simple curl request against the local endpoint is enough to test it

Now it is time to check the results of this long work. To test our model, I just wanted to use real-life images. So I took some photos of tools I have on my workbench at home.

Paintbrush as a toothbrush with Coco pre-trained model

On this first photo of a small paintbrush, I tested the behavior of the Yolo pre-trained model with the Coco data set. And no surprise here, the Coco data set does not contain any paintbrush labels but a toothbrush. So it recognizes the object like a toothbrush, with a low probability (29%)

Now the same photo but with our model, trained with transfer learning and our data set. We get something better: a paintbrush at 65%.

Paintbrush with our model

Our model used two other photos as examples, with a broad set of objects and a driller box.

Tools on the workbench
Driller box

It is not perfect, more training would undoubtedly increase the accuracy (lower loss function), and object photos are perhaps too homogeneous (i.e., not enough kind of screwdrivers)

But it works.

Coming next :

Smile is the proud editor of ElasticSuite, a great Magento open-source extension, with more than a million downloads on Github and trusted by more than 1500 top retailers worldwide. It’s the leading solution for intelligent search and merchandising on Magento.

As part of this product road map, we have to test and experiment with new features.

That’s all, folks!
Did you enjoy it? If so, don’t hesitate to 👏 our article or subscribe to our Innovation watch newsletter! You can follow Smile on Facebook, TwitterYoutube.


Starting from scratch, how to embed computer vision techniques into your project #3 was originally published in Smile Innovation on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/starting-from-scratch-how-to-embed-computer-vision-techniques-into-your-project-3/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×