WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

WhyLabs Admin

Dec 2, 2021

Back to Blog

Why You Need ML Monitoring

AI Observability
ML Monitoring
Data Quality
WhyLabs

WhyLabs Admin

Dec 2, 2021

You are a data scientist or machine learning engineer. You’ve spent months getting together a model to satisfy some business requirement. You scoped the problem, gathered the data, prepared it, trained the model, iterated on a few versions until you hit the business’s performance requirements, and even managed to get the model into production. Now you’re done, right?

Unfortunately not. The truth is, the work has just begun. Now that your model is in production, you need to be able to ensure that it’s still meeting the previously mentioned performance requirements. If you simply deploy your model and walk away to your next project, your model performance can degrade (or even fail entirely) and you won’t have any way of knowing.

Machine learning models are increasingly becoming key to businesses of all shapes and sizes, performing myriad functions. As businesses come to depend more and more on machine learning, the need to ensure that models are still performant grows as well. If a machine learning model is providing value to a business, it’s essential that the model remains performant.

You may be asking yourself “But why would my model performance degrade? What would cause my model to fail? It worked in my development environment, why wouldn’t it work in production?”

The answer is simple, and well stated by Cassie Kozyrkov, Chief Decision Scientist at Google:

"The world represented by your training data is the only world you can expect to succeed in.”

If the data being passed to your machine learning model for inference differs from the data that the model was trained on, your model won’t perform as expected. There are two categories of issues that cause your production data to be different from your training data: data quality issues and data change issues.

Data quality issues

Data quality issues occur when instrumentation around data collection, processing, or storage breaks down. The most obvious signs of data quality issues are lots of null or missing values, but there are other, more insidious problems with data quality that can arise as well.

One example of a data quality issue is type mismatch. If American ZIP codes (five digits) are encoded as strings when a model is trained but some upstream process causes them to be encoded as integers when the model is in production, the model will likely be able to make neither heads nor tails of it. If you are lucky, the model will break loudly and you can rectify the problem swiftly. Otherwise, the model will fail silently, throwing a warning and making a sub-par prediction.

Data quality issues can occur in both batch and streaming data pipelines, and only by monitoring the data being passed to the model can a machine learning engineer ensure that their model is relying on high quality data.

Data change issues

Training-serving skew, concept drift, data drift, covariate shift, oh my…

The breadth of data change issues is wide, and an exhaustive examination deserves its own blog post. By and large, data change issues can be summarized by the statement: “the real world processes generating the data have changed between when the training data was captured and now that the model is in production.”

This change might be a covariate shift (a change in the independent input variables being fed to the model), a prior probability shift (a shift in the dependent target variable being predicted by the model), or a concept shift (a change in the relationship between the independent and dependent variables). In all likelihood, some combination of these three issues will occur with your model, though over what time period and to what extent is highly variable.

It’s important to note that all data change issues can cause model performance degradation. Fortunately, by monitoring the input data and the predictions made by the model, it’s possible to be alerted when any of these data change issues arise.

But how?

Hopefully, you come away from this post with a thorough understanding of the problems that can arise if you don’t monitor your ML models. But, now that we’ve answered the question “Why should I monitor my machine learning models?”, we also need to answer “How do I monitor my machine learning models?”

Monitoring machine learning models in production can be daunting, but WhyLabs makes it easy. With our fully self-serve signup flow and zero configuration setup, you can start monitoring your models right away. Better still, the Starter tier, which gives you all of the features of the platform for a single model for free, allows you to trial the system without even entering your credit card info.

WhyLabs Admin

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Why You Need ML Monitoring

Data quality issues

Data change issues

But how?

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs