blog bg left
Back to Blog

Monitoring Image Data with whylogs v1

When operating computer vision systems, data quality and data drift issues always pose the risk of model performance degradation. Even in the most highly controlled environments, subtle issues can lead to a model underperforming and may go undetected for weeks or even longer. The whylogs library and WhyLabs platform provide ML Engineers and Data Scientists with a simple yet highly customizable solution for always maintaining observability into their data, allowing them to detect issues and take action sooner. With native image profiling support re-introduced to whylogs v1.1, computer vision practitioners can now take full advantage of the simplified API, performance improvements, and visualization capabilities which debuted with whylogs 1.0.

The whylogs/WhyLabs solution

In a previous post, the various physical, procedural, and pipeline-based sources of data drift and data quality issues were discussed. A few of these are outlined below:

These issues can be monitored for by regularly collecting calculated metrics from raw images along with any Exif data or other metadata. These include metrics such as…

  • Image brightness (mean, standard deviation)
  • Hue (mean, standard deviation)
  • Saturation (mean, standard deviation)
  • Height/Width
  • Colorspace
  • Exif data

These metrics and more are automatically captured by whylogs. A powerful property of these metrics is that they are highly versatile when applied to image monitoring. Despite the wide variety of issues contributing to model performance degradation, many of these issues will manifest themselves as anomalies in the metrics listed above. An end-to-end monitoring solution can be achieved by continuously comparing these metrics generated from inference data in production against the same metrics generated for some baseline (e.g. a training set, validation set, trailing window, etc.).

In this blog post, we’ll visit several realistic examples of data drift and data quality issues and demonstrate how whylogs and WhyLabs provide a simple yet powerful solution for monitoring image data. We will then explore how whylogs can be easily extended to capture custom metrics for highly specific use cases.

Healthcare example: blood cell classification

Consider an image classification model tasked with identifying the subtype of white blood cells. In practice, white blood cells are dyed for better visibility, but this process doesn’t always yield consistent results. In the example below we see a noticeable difference in the training and inference data. This may have been caused by an over-diluted dye being used, or a different application method. In any case, the differences can lead to the model struggling to recognize the patterns necessary for accurate predictions.

To begin profiling these images with whylogs, users must first install the whylogs library along with the image and viz extras.

pip install "whylogs[image,viz]"

With the following, users can profile a single image and view the resulting profile as a dataframe.

from PIL import Image
from whylogs.extras.image_metric import log_image

# read in image
img ='/path/to/image.jpg')

# profile image
profile = log_image(img).profile()

# generate profile view
profile_view = profile.view()

# view profile as dataframe

This represents just a fraction of the telemetry captured by the whylogs profile. This can be extended to capture a single profile for an entire collection of images. In this case, the training dataset for the cell classification model.

from PIL import Image
from whylogs.extras.image_metric import log_image
import os

training_path = '/path/to/training_set'

# initialize reference_profile
reference_profile = None

# loop over each image in training folder
for filename in os.listdir(training_path):
    file_path = os.path.join(training_path, filename)
    img =
    profile = log_image(img).profile()
    profile_view = profile.view()

    # merge each profile while looping
    if reference_profile is None:
      reference_profile = profile_view
      reference_profile = reference_profile.merge(profile_view)

Similarly, a profile can be generated from the batch of inference data for which cells were stained with the diluted dye. Using the whylogs viz module, we can quickly perform out-of-the-box drift detection by generating a Summary Drift Report.

from whylogs.viz import NotebookProfileVisualizer

# set target and referenece profiles
visualization = NotebookProfileVisualizer()
                           reference_profile_view= reference_profile)

# generate report

The report generated allows us to view the distribution of each tracked metric side by side for our reference and inference profiles. The p-values shown here are those resulting from a K-S test for each pair of distributions. The low p-value for the 3 metrics below suggest possible data drift.

In this case, the largest discrepancy is for the standard deviation of image saturation. The histograms seen here are constructed using the standard deviation of image saturation for each of the 50 individual images in the training/inference set.

Now, consider a new situation in which the white blood cells are suddenly appearing larger in the image. This may result from a different microscope magnification, a change in the image pre-processing, or even an automatic update to the imaging software itself.

A well known strength of CNNs is their tolerance to object translations within an image, but CNNs are generally sensitive to changes in the object size. Such a change in data could prove problematic for a model.

Upon comparing the profiles generated from our training set and inference set, we find evidence of data drift in several of the tracked metrics, including the the mean hue. Running this simple check could have provided an early warning about this sudden variation. We can view overlaid distributions from each dataset using the following. The shift is obvious.


Extending to other data issues

We will now look at two very different types of data issues. In the following example, a news corporation gathers photos from various sources which have been taken throughout their community. They use a model to recognize various weather events from images which they share with their followers across multiple social media platforms.

Their model was originally trained to recognize events such as snow, rain, lightning, fog, rainbows, and sandstorms.

Over time, they began reporting in more locations across the country, but continued using the same model. Due to several new locations in the southwest, the image feed is now receiving a disproportionately high number of sandstorm photos.

Considering the model was trained on a dataset with a smaller proportion of sandstorm images, this data drift incident is likely to contribute to worsening model performance. By monitoring the images flowing through the model in production using whylogs/WhyLabs, the news corporation would have immediately detected evidence of drift in several different features tracked by whylogs.

In this next example, a company is attempting to build smarter traffic lights for more efficient traffic direction. At the heart of their solution is a model responsible for counting vehicles on the road. Since the model is operating on edge devices integrated with traffic lights, the company wants to minimize the required computational power by downsizing their input to 64x64 images which they do in conjunction with applying a gaussian blur.

This worked well on the training set, but the machine learning engineers failed to realize that some of the devices in production already capture the raw images with the desired size. Applying gaussian blur to these raw images resulted in a poor image quality, leading to poor model performance. Worse yet, this was only noticed in the form of inefficient traffic direction weeks after deploying the update to hundreds of devices.

As we’ll see, this harmful result could have been detected shortly after the software update using whylogs. Below, we see that the inference dataset generally has a lower standard deviation of image brightness across a given image. This is not surprising since the standard deviation of brightness is often used as a basic measure of image sharpness.

End-to-end monitoring with WhyLabs

While profiling with whylogs offers a quick way to visualize and compare dataset profiles, the WhyLabs Platform allows users to extract the full value from these profiles with highly customizable anomaly detection, automatic notifications, and insightful visualizations. Best of all, users can get started with monitoring their first model for free!

whylogs was used to upload profiles for the blood cell image dataset discussed above. Profiles for the images corresponding to the diluted dye were uploaded on September 7th and profiles for the increased magnification on September 9th. For other dates, the images were consistent with the training dataset.

After enabling the 1-click preset drift monitor, WhyLabs automatically detected significant data drift occurring on these dates for the standard deviation of the image saturation.

In this case, the blue line spiking on the 7th and 9th represents the drift distance. This drift distance quantifies the degree of data drift against some baseline using the hellinger distance. For this example, the training dataset was used as the baseline, but users may also choose to compare against a trailing window or a particular date range instead. Visit our documentation to learn more!

In order to upload profiles to WhyLabs, be sure to install the whylabs extra.

pip install "whylogs[whylabs]"

You will need an API key, your org ID, and the model Id. The steps for retrieving these are outlined in the Onboarding to the Platform documentation page. Users can upload profiles.

from whylogs.api.writer.whylabs import WhyLabsWriter
from PIL import Image
from whylogs.extras.image_metric import log_image
import os

# set environment vars to route uploaded profile
os.environ["WHYLABS_API_KEY"] = 'YOUR-API-KEY'

# read in image and profile
file_path = 'path/to/file.png'
img =
profile = log_image(img).profile()

# optionally set profile timestamp

profile_view = profile.view()

# merge other profiles if logging multiple images

# write profile to WhyLabs
writer = WhyLabsWriter()

Custom Metrics & Exif Data

Some use cases are highly specific and users may wish to monitor metrics which go beyond those which whylogs provides out of the box. whylogs was designed with this sort of flexibility as a priority. Suppose a user is concerned about the amount of glare in their images. They want to monitor the percentage of pixels with values greater than 245 in images such as this one.

A custom function like this can be introduced into the image profiling process as shown below.

from typing import Dict
import whylogs as why
from whylogs.core.datatypes import DataType
from whylogs.core.metrics import Metric, MetricConfig
from whylogs.core.resolvers import StandardResolver
from whylogs.core.schema import DatasetSchema, ColumnSchema
from whylogs.extras.image_metric import ImageMetric
from PIL import Image
import numpy as np
import pandas as pd

class ImageResolver(StandardResolver):
  def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
    if "image" in name:
      return {ImageMetric.get_namespace(MetricConfig()):}
    return super(ImageResolver, self).resolve(name, why_type, column_schema)

def glare_percent(img):
    img_array = np.array(img)
    saturated_pixels = np.sum(img_array >= 245)
    total_pixels = img_array.size
    return 100.0*saturated_pixels/total_pixels

schema = DatasetSchema(resolvers=ImageResolver())

img ='path/to/image.jpg')

# log image along with custom metric
results = why.log(row={"glare_percent": glare_percent(img), "images": img}, schema=schema)
profile = results.profile()
profile_view = profile.view()

# extract desired stats for custom metric
profile_dict = profile_view.get_column('glare_percent').to_summary_dict()
columns = ['counts/n', 'distribution/mean', 'distribution/stddev']

We see that our profile was generated from a single image with a glare metric of 1.69 and a standard deviation of 0 (which we would expect for a single record).

As a last example, consider a smart photography application which gives aspiring photographers tips and guidance based on photos they upload.

Customers use a wide variety of devices including mobile phones, professional cameras, and more. In this case, advice may be tailored to the device being used. Therefore, it’s important to keep an eye on data drift in the make, model, and software version associated with uploaded images.

whylogs will automatically capture exif data associated with an image in the form of TIFF metadata readable by pillow’s TiffTags module. The available tags can be checked with the following.

from PIL import Image
from PIL.TiffTags import TAGS

img ='path/to/image.jpg')

exif_tag_list= [TAGS[k] for k in list(img.getexif())]


Example output:


This metadata will also be automatically available for monitoring upon uploading profiles to WhyLabs. In the case of the photography application, users can quickly start monitoring for data drift among the most frequent device make, model, and software version.


Regardless of your computer vision use case, whylogs offers a lightweight and customizable solution for extracting the important telemetry needed to detect data quality and data drift issues early and keep your models performing well. With WhyLabs, customizable anomaly detection, automatic notifications, and insightful visualizations allow engineers and data scientists to spend less time troubleshooting data issues and more time building the models which make our lives better.

Start monitoring your image datasets

Other posts

Glassdoor Decreases Latency Overhead and Improves Data Monitoring with WhyLabs

The Glassdoor team describes their integration latency challenges and how they were able to decrease latency overhead and improve data monitoring with WhyLabs.

Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs

WhyLabs and Amazon Web Services (AWS) explore the various ways embeddings are used, issues that can impact your ML models, how to identify those issues and set up monitors to prevent them in the future!

Data Drift Monitoring and Its Importance in MLOps

It's important to continuously monitor and manage ML models to ensure ML model performance. We explore the role of data drift management and why it's crucial in your MLOps pipeline.

Ensuring AI Success in Healthcare: The Vital Role of ML Monitoring

Discover how ML monitoring plays a crucial role in the Healthcare industry to ensure the reliability, compliance, and overall safety of AI-driven systems.

WhyLabs Recognized by CB Insights GenAI 50 among the Most Innovative Generative AI Startups

WhyLabs has been named on CB Insights’ first annual GenAI 50 list, named as one of the world’s top 50 most innovative companies developing generative AI applications and infrastructure across industries.

Hugging Face and LangKit: Your Solution for LLM Observability

See how easy it is to generate out-of-the-box text metrics for Hugging Face LLMs and monitor them in WhyLabs to identify how model performance and user interaction are changing over time.

7 Ways to Monitor Large Language Model Behavior

Discover seven ways to track and monitor Large Language Model behavior using metrics for ChatGPT’s responses for a fixed set of 200 prompts across 35 days.

Safeguarding and Monitoring Large Language Model (LLM) Applications

We explore the concept of observability and validation in the context of language models, and demonstrate how to effectively safeguard them using guardrails.

Robust & Responsible AI Newsletter - Issue #6

A quarterly roundup of the hottest LLM, ML and Data-Centric AI news, including industry highlights, what’s brewing at WhyLabs, and more.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo