blog bg left
Back to Blog

Detecting and Fixing Data Drift in Computer Vision


If you have been working in Data Science and ML for a while you know that the models you have trained can perform very unexpectedly in a production environment. This is because the production data tends to be different from the closed train dataset you used to create the model. Additionally, the production data keeps changing over time so even models that initially perform well will degrade over time.

Image by the Author

What has been described above is known as data drift and it is very common in ML.

There have been plenty of articles explaining the concept in-depth but in this tutorial, we will focus on the practical part of data drift detection and fixing it on a computer vision example.

We will give you the overall explanation of the data drift monitoring system we have created here and for more details and full code, you can check this colab notebook.

Problem statement

Cat or Dog? — Image by the Author

In this case study, we will monitor a computer vision classifier that has been trained to distinguish between cats and dogs. The actual model that we have used has been trained using Keras and is just a simple CNN.

The data that will monitor will simulate one-week production performance so we have divided images into day batches (day_1, day_2, …, day7).

Additionally, we simulated the data drift in later days, so in the first batches, you will see just normal images of cats and dogs…

Whereas later days had some camera problems, some pixels got corrupted and images look more like this…

Camera problems — Image by the Author

Let’s see if we are able to detect it with our drift monitoring system.

How to detect data drift and get early alerts

We’ll use the open-source library, whylogs to profile our data and send the profiles to WhyLabs for data drift and ML performance monitoring.

Profiles created with whylogs are efficient, mergeable, and only contain essential summary statistics. This allows them to be used in almost any application, whether it requires processing data in batches or streams, or in sensitive industries like healthcare and finance. Read more about whylogs on GitHub.

WhyLabs makes it easy to store and monitor profiles created with whylogs to detect anomalies in machine learning pipelines such as data drift, data quality issues, and model bias.

We covered details of monitoring our computer vision data and model in our Colab notebook for this post but below you can see a preview of the WhyLabs dashboard.

Detecting Data Drift in WhyLabs — Image by Author

On the left-hand side of it, you can see different features that are being monitored (brightness, pixel height, pixel width, saturation, and more). Several of them have a red exclamation mark next to them. This suggests that our data has drifted in those dimensions. You can explore each feature in detail (the dashboard's center). The screenshot above shows values for brightness and you can see that values for this started to drift on the 4th day of drift monitoring and continued to do so on the fifth, sixth, and seventh days.

That would make sense, remember that we have applied some pixelation to images sent for later days.

Model drift detection with human annotation

We already have been informed that some of the data on later days look a bit different from the data on which our model was trained. But is the model’s performance affected?

In order to check it will need human annotation. We could do it ourselves but it is highly impractical and not scalable so we will use Toloka crowdsourcing platform.

In order to do it we first need to set up a labeling project where we specify instructions for labelers and design an interface. The latter can be done using the Template Builder facility in Toloka as shown below.

Task interface for classification project — Image by the Author

Once we have set up a project we can upload all photos that need annotations. We can do it programmatically using python and Toloka-Kit library. Remember that you have the full code needed to set up this system in this notebook.

Once your day batches are uploaded you should have them organized in a daily manner as seen below.

Image by the Author

Before the labeling starts we need to choose performers that will take part in the project. Because it is a simple task we do not need to do complex filtering techniques. We will allow annotators that speak English (as this is the language in which the instructions are written) to participate.

Additionally, we have set up some quality control rules such as CAPTCHAs and control tasks. This is important as crowdsourcing to be an effective tool needs rigid quality control rules. We also sent the same image to be labeled by three different annotators to have more confidence about each label.

When all annotators finish the work we can send this data (our gold standards) together with model predictions and analyze them further with WhyLabs.

Comparing model predictions with human annotations

We can now use WhyLabs to monitor machine learning performance metrics by comparing model prediction values and ground truth. Monitors can also be configured to detect if these metrics change.

Below you can see a dashboard with accuracy, precision, recall, f-score, and confusion matrix for our case study. Note that the accuracy of the model has fallen from around 80% to 40% on later days. It looks like the initial alerts we had from input features at the beginning of this case study were right and are now confirmed by the ground truth. Our model has drifted!

Monitoring Machine Learning Performance Metrics in WhyLabs — Image by Authors

Once we have discovered models drift the usual procedure would be to retrain the model on new examples to account for the changes in the environment. As we have already annotated the new data with Toloka we could now use it to retrain the model. Or if we think that the new sample is too small we would trigger the larger data labeling pipeline to gather more training examples.


This tutorial taught you how to set up an ML model drift monitoring system for a computer vision project. We used a simple classification example of cats and dogs but this case study can be easily extended to more complex projects.

Remember that you can check the full code in this notebook. Also, we had run a live workshop where we explained step by step what is happening at each stage of this pipeline. It is a long recording but worth a look if you are interested in a more detailed explanation.

I also would like to thank Sage Elliott who is the co-author of this article and Daniil Fedulov who coauthored with us the initial notebook.

Detecting and fixing data drift in Computer Vision was originally published by Towards Data Science.

Other posts

How to Troubleshoot Embeddings Without Eye-balling t-SNE or UMAP Plots

WhyLabs' scalable approach to monitoring high dimensional embeddings data means you don’t have to eye-ball pretty UMAP plots to troubleshoot embeddings!

Robust & Responsible AI Newsletter - Issue #5

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

Detecting Financial Fraud in Real-Time: A Guide to ML Monitoring

Fraud is a significant challenge for financial institutions and businesses. As fraudsters constantly adapt their tactics, it’s essential to implement a robust ML monitoring system to ensure that models effectively detect fraud and minimize false positives.

Achieving Ethical AI with Model Performance Tracing and ML Explainability

With Model Performance Tracing and ML Explainability, we’ve accelerated our customers’ journey toward achieving the three goals of ethical AI - fairness, accountability and transparency.

BigQuery Data Monitoring with WhyLabs

We’re excited to announce the release of a no-code solution for data monitoring in Google BigQuery, making it simple to monitor your data quality without writing a single line of code.

Robust & Responsible AI Newsletter - Issue #4

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

WhyLabs Private Beta: Real-time Data Monitoring on Prem

We’re excited to announce our Private Beta release of an extension service for the Profile Store, enabling production use cases of whylogs on customers' premises.

Understanding Kolmogorov-Smirnov (KS) Tests for Data Drift on Profiled Data

We experiment with statistical tests, Kolmogorov-Smirnov (KS) specifically, applied to full datasets and dataset profiles and compare the results.

Re-imagine Data Monitoring with whylogs and Apache Spark

An overview of how the whylogs integration with Apache Spark achieves large scale data profiling, and how users can apply this integration into existing data and ML pipelines.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo