Detecting and Fixing Data Drift in Computer Vision
- ML Monitoring
- Whylogs
- Image Data
- WhyLabs
Jan 26, 2023
Introduction
If you have been working in Data Science and ML for a while you know that the models you have trained can perform very unexpectedly in a production environment. This is because the production data tends to be different from the closed train dataset you used to create the model. Additionally, the production data keeps changing over time so even models that initially perform well will degrade over time.
What has been described above is known as data drift and it is very common in ML.
There have been plenty of articles explaining the concept in-depth but in this tutorial, we will focus on the practical part of data drift detection and fixing it on a computer vision example.
We will give you the overall explanation of the data drift monitoring system we have created here and for more details and full code, you can check this colab notebook.
Problem statement
In this case study, we will monitor a computer vision classifier that has been trained to distinguish between cats and dogs. The actual model that we have used has been trained using Keras and is just a simple CNN.
The data that will monitor will simulate one-week production performance so we have divided images into day batches (day_1, day_2, …, day7).
Additionally, we simulated the data drift in later days, so in the first batches, you will see just normal images of cats and dogs…
Whereas later days had some camera problems, some pixels got corrupted and images look more like this…
Let’s see if we are able to detect it with our drift monitoring system.
How to detect data drift and get early alerts
We’ll use the open-source library, whylogs to profile our data and send the profiles to WhyLabs for data drift and ML performance monitoring.
Profiles created with whylogs are efficient, mergeable, and only contain essential summary statistics. This allows them to be used in almost any application, whether it requires processing data in batches or streams, or in sensitive industries like healthcare and finance. Read more about whylogs on GitHub.
WhyLabs makes it easy to store and monitor profiles created with whylogs to detect anomalies in machine learning pipelines such as data drift, data quality issues, and model bias.
We covered details of monitoring our computer vision data and model in our Colab notebook for this post but below you can see a preview of the WhyLabs dashboard.
On the left-hand side of it, you can see different features that are being monitored (brightness, pixel height, pixel width, saturation, and more). Several of them have a red exclamation mark next to them. This suggests that our data has drifted in those dimensions. You can explore each feature in detail (the dashboard's center). The screenshot above shows values for brightness and you can see that values for this started to drift on the 4th day of drift monitoring and continued to do so on the fifth, sixth, and seventh days.
That would make sense, remember that we have applied some pixelation to images sent for later days.
Model drift detection with human annotation
We already have been informed that some of the data on later days look a bit different from the data on which our model was trained. But is the model’s performance affected?
In order to check it will need human annotation. We could do it ourselves but it is highly impractical and not scalable so we will use Toloka crowdsourcing platform.
In order to do it we first need to set up a labeling project where we specify instructions for labelers and design an interface. The latter can be done using the Template Builder facility in Toloka as shown below.
Once we have set up a project we can upload all photos that need annotations. We can do it programmatically using python and Toloka-Kit library. Remember that you have the full code needed to set up this system in this notebook.
Once your day batches are uploaded you should have them organized in a daily manner as seen below.
Before the labeling starts we need to choose performers that will take part in the project. Because it is a simple task we do not need to do complex filtering techniques. We will allow annotators that speak English (as this is the language in which the instructions are written) to participate.
Additionally, we have set up some quality control rules such as CAPTCHAs and control tasks. This is important as crowdsourcing to be an effective tool needs rigid quality control rules. We also sent the same image to be labeled by three different annotators to have more confidence about each label.
When all annotators finish the work we can send this data (our gold standards) together with model predictions and analyze them further with WhyLabs.
Comparing model predictions with human annotations
We can now use WhyLabs to monitor machine learning performance metrics by comparing model prediction values and ground truth. Monitors can also be configured to detect if these metrics change.
Below you can see a dashboard with accuracy, precision, recall, f-score, and confusion matrix for our case study. Note that the accuracy of the model has fallen from around 80% to 40% on later days. It looks like the initial alerts we had from input features at the beginning of this case study were right and are now confirmed by the ground truth. Our model has drifted!
Once we have discovered models drift the usual procedure would be to retrain the model on new examples to account for the changes in the environment. As we have already annotated the new data with Toloka we could now use it to retrain the model. Or if we think that the new sample is too small we would trigger the larger data labeling pipeline to gather more training examples.
Summary
This tutorial taught you how to set up an ML model drift monitoring system for a computer vision project. We used a simple classification example of cats and dogs but this case study can be easily extended to more complex projects.
Remember that you can check the full code in this notebook. Also, we had run a live workshop where we explained step by step what is happening at each stage of this pipeline. It is a long recording but worth a look if you are interested in a more detailed explanation.
I also would like to thank Sage Elliott who is the co-author of this article and Daniil Fedulov who coauthored with us the initial notebook.
“Detecting and fixing data drift in Computer Vision” was originally published by Towards Data Science.
Other posts
Best Practicies for Monitoring and Securing RAG Systems in Production
Oct 8, 2024
- Retrival-Augmented Generation (RAG)
- LLM Security
- Generative AI
- ML Monitoring
- LangKit
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI