WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

WhyLabs Team

Jan 26, 2023

Back to Blog

Detecting and Fixing Data Drift in Computer Vision

ML Monitoring
Whylogs
Image Data
WhyLabs

WhyLabs Team

Jan 26, 2023

Introduction

If you have been working in Data Science and ML for a while you know that the models you have trained can perform very unexpectedly in a production environment. This is because the production data tends to be different from the closed train dataset you used to create the model. Additionally, the production data keeps changing over time so even models that initially perform well will degrade over time.

What has been described above is known as data drift and it is very common in ML.

There have been plenty of articles explaining the concept in-depth but in this tutorial, we will focus on the practical part of data drift detection and fixing it on a computer vision example.

We will give you the overall explanation of the data drift monitoring system we have created here and for more details and full code, you can check this colab notebook.

Problem statement

In this case study, we will monitor a computer vision classifier that has been trained to distinguish between cats and dogs. The actual model that we have used has been trained using Keras and is just a simple CNN.

The data that will monitor will simulate one-week production performance so we have divided images into day batches (day_1, day_2, …, day7).

Additionally, we simulated the data drift in later days, so in the first batches, you will see just normal images of cats and dogs…

Whereas later days had some camera problems, some pixels got corrupted and images look more like this…

Let’s see if we are able to detect it with our drift monitoring system.

How to detect data drift and get early alerts

We’ll use the open-source library, whylogs to profile our data and send the profiles to WhyLabs for data drift and ML performance monitoring.

Profiles created with whylogs are efficient, mergeable, and only contain essential summary statistics. This allows them to be used in almost any application, whether it requires processing data in batches or streams, or in sensitive industries like healthcare and finance. Read more about whylogs on GitHub.

WhyLabs makes it easy to store and monitor profiles created with whylogs to detect anomalies in machine learning pipelines such as data drift, data quality issues, and model bias.

We covered details of monitoring our computer vision data and model in our Colab notebook for this post but below you can see a preview of the WhyLabs dashboard.

On the left-hand side of it, you can see different features that are being monitored (brightness, pixel height, pixel width, saturation, and more). Several of them have a red exclamation mark next to them. This suggests that our data has drifted in those dimensions. You can explore each feature in detail (the dashboard's center). The screenshot above shows values for brightness and you can see that values for this started to drift on the 4th day of drift monitoring and continued to do so on the fifth, sixth, and seventh days.

That would make sense, remember that we have applied some pixelation to images sent for later days.

Model drift detection with human annotation

We already have been informed that some of the data on later days look a bit different from the data on which our model was trained. But is the model’s performance affected?

In order to check it will need human annotation. We could do it ourselves but it is highly impractical and not scalable so we will use Toloka crowdsourcing platform.

In order to do it we first need to set up a labeling project where we specify instructions for labelers and design an interface. The latter can be done using the Template Builder facility in Toloka as shown below.

Once we have set up a project we can upload all photos that need annotations. We can do it programmatically using python and Toloka-Kit library. Remember that you have the full code needed to set up this system in this notebook.

Once your day batches are uploaded you should have them organized in a daily manner as seen below.

Before the labeling starts we need to choose performers that will take part in the project. Because it is a simple task we do not need to do complex filtering techniques. We will allow annotators that speak English (as this is the language in which the instructions are written) to participate.

Additionally, we have set up some quality control rules such as CAPTCHAs and control tasks. This is important as crowdsourcing to be an effective tool needs rigid quality control rules. We also sent the same image to be labeled by three different annotators to have more confidence about each label.

When all annotators finish the work we can send this data (our gold standards) together with model predictions and analyze them further with WhyLabs.

Comparing model predictions with human annotations

We can now use WhyLabs to monitor machine learning performance metrics by comparing model prediction values and ground truth. Monitors can also be configured to detect if these metrics change.

Below you can see a dashboard with accuracy, precision, recall, f-score, and confusion matrix for our case study. Note that the accuracy of the model has fallen from around 80% to 40% on later days. It looks like the initial alerts we had from input features at the beginning of this case study were right and are now confirmed by the ground truth. Our model has drifted!

Monitoring Machine Learning Performance Metrics in WhyLabs — Image by Authors

Once we have discovered models drift the usual procedure would be to retrain the model on new examples to account for the changes in the environment. As we have already annotated the new data with Toloka we could now use it to retrain the model. Or if we think that the new sample is too small we would trigger the larger data labeling pipeline to gather more training examples.

Summary

This tutorial taught you how to set up an ML model drift monitoring system for a computer vision project. We used a simple classification example of cats and dogs but this case study can be easily extended to more complex projects.

Remember that you can check the full code in this notebook. Also, we had run a live workshop where we explained step by step what is happening at each stage of this pipeline. It is a long recording but worth a look if you are interested in a more detailed explanation.

I also would like to thank Sage Elliott who is the co-author of this article and Daniil Fedulov who coauthored with us the initial notebook.

“Detecting and fixing data drift in Computer Vision” was originally published by Towards Data Science.

WhyLabs Team

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Detecting and Fixing Data Drift in Computer Vision

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs