blog bg left
Back to Blog

Why You Need ML Monitoring

You are a data scientist or machine learning engineer. You’ve spent months getting together a model to satisfy some business requirement. You scoped the problem, gathered the data, prepared it, trained the model, iterated on a few versions until you hit the business’s performance requirements, and even managed to get the model into production. Now you’re done, right?

Unfortunately not. The truth is, the work has just begun. Now that your model is in production, you need to be able to ensure that it’s still meeting the previously mentioned performance requirements. If you simply deploy your model and walk away to your next project, your model performance can degrade (or even fail entirely) and you won’t have any way of knowing.

Machine learning models are increasingly becoming key to businesses of all shapes and sizes, performing myriad functions. As businesses come to depend more and more on machine learning, the need to ensure that models are still performant grows as well. If a machine learning model is providing value to a business, it’s essential that the model remains performant.

You may be asking yourself “But why would my model performance degrade? What would cause my model to fail? It worked in my development environment, why wouldn’t it work in production?”

The answer is simple, and well stated by Cassie Kozyrkov, Chief Decision Scientist at Google:

"The world represented by your training data is the only world you can expect to succeed in.”

If the data being passed to your machine learning model for inference differs from the data that the model was trained on, your model won’t perform as expected. There are two categories of issues that cause your production data to be different from your training data: data quality issues and data change issues.

Data quality issues

Data quality issues occur when instrumentation around data collection, processing, or storage breaks down. The most obvious signs of data quality issues are lots of null or missing values, but there are other, more insidious problems with data quality that can arise as well.

One example of a data quality issue is type mismatch. If American ZIP codes (five digits) are encoded as strings when a model is trained but some upstream process causes them to be encoded as integers when the model is in production, the model will likely be able to make neither heads nor tails of it. If you are lucky, the model will break loudly and you can rectify the problem swiftly. Otherwise, the model will fail silently, throwing a warning and making a sub-par prediction.

Data quality issues can occur in both batch and streaming data pipelines, and only by monitoring the data being passed to the model can a machine learning engineer ensure that their model is relying on high quality data.

Data change issues

Training-serving skew, concept drift, data drift, covariate shift, oh my…

The breadth of data change issues is wide, and an exhaustive examination deserves its own blog post. By and large, data change issues can be summarized by the statement: “the real world processes generating the data have changed between when the training data was captured and now that the model is in production.”

This change might be a covariate shift (a change in the independent input variables being fed to the model), a prior probability shift (a shift in the dependent target variable being predicted by the model), or a concept shift (a change in the relationship between the independent and dependent variables). In all likelihood, some combination of these three issues will occur with your model, though over what time period and to what extent is highly variable.

It’s important to note that all data change issues can cause model performance degradation. Fortunately, by monitoring the input data and the predictions made by the model, it’s possible to be alerted when any of these data change issues arise.

But how?

Hopefully, you come away from this post with a thorough understanding of the problems that can arise if you don’t monitor your ML models. But, now that we’ve answered the question “Why should I monitor my machine learning models?”, we also need to answer “How do I monitor my machine learning models?”

Monitoring machine learning models in production can be daunting, but WhyLabs makes it easy. With our fully self-serve signup flow and zero configuration setup, you can start monitoring your models right away. Better still, the Starter tier, which gives you all of the features of the platform for a single model for free, allows you to trial the system without even entering your credit card info.

Other posts

Glassdoor Decreases Latency Overhead and Improves Data Monitoring with WhyLabs

The Glassdoor team describes their integration latency challenges and how they were able to decrease latency overhead and improve data monitoring with WhyLabs.

Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs

WhyLabs and Amazon Web Services (AWS) explore the various ways embeddings are used, issues that can impact your ML models, how to identify those issues and set up monitors to prevent them in the future!

Data Drift Monitoring and Its Importance in MLOps

It's important to continuously monitor and manage ML models to ensure ML model performance. We explore the role of data drift management and why it's crucial in your MLOps pipeline.

Ensuring AI Success in Healthcare: The Vital Role of ML Monitoring

Discover how ML monitoring plays a crucial role in the Healthcare industry to ensure the reliability, compliance, and overall safety of AI-driven systems.

WhyLabs Recognized by CB Insights GenAI 50 among the Most Innovative Generative AI Startups

WhyLabs has been named on CB Insights’ first annual GenAI 50 list, named as one of the world’s top 50 most innovative companies developing generative AI applications and infrastructure across industries.

Hugging Face and LangKit: Your Solution for LLM Observability

See how easy it is to generate out-of-the-box text metrics for Hugging Face LLMs and monitor them in WhyLabs to identify how model performance and user interaction are changing over time.

7 Ways to Monitor Large Language Model Behavior

Discover seven ways to track and monitor Large Language Model behavior using metrics for ChatGPT’s responses for a fixed set of 200 prompts across 35 days.

Safeguarding and Monitoring Large Language Model (LLM) Applications

We explore the concept of observability and validation in the context of language models, and demonstrate how to effectively safeguard them using guardrails.

Robust & Responsible AI Newsletter - Issue #6

A quarterly roundup of the hottest LLM, ML and Data-Centric AI news, including industry highlights, what’s brewing at WhyLabs, and more.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo
loading...