WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Sage Elliott

May 17, 2023

Back to Blog

Mind Your Models: 5 Ways to Implement ML Monitoring in Production

WhyLabs
ML Monitoring
Data Quality
Whylogs

Sage Elliott

May 17, 2023

Machine learning (ML) models are the backbone of modern business operations, enabling unparalleled automation and optimization. But here's the catch: deploying ML models is just the beginning of the journey. Monitoring their performance in production is essential to ensure they continue to meet the expected outcomes. In this blog post, we will discuss five ways to monitor your ML models in production.

What is ML monitoring?

Machine Learning (ML) monitoring is the continual oversight and evaluation of the performance of ML models over time. It is critical because ML model performance can worsen over time as data or the environment changes, a phenomenon known as "model drift." These issues can be identified through ML monitoring, which provides insights into the model's performance metrics, data quality, and overall application health.

Note: All the ML monitoring techniques discussed in this post can be implemented with the open-source ML monitoring library, whylogs, or the WhyLabs AI observability platform.

An example of ML monitoring within an AI application

ML monitoring for data drift

Data drift occurs when the input data to an ML model changes over time. The incoming data from production may no longer be similar to the data distribution used to train the model. As a result, the model's performance can degrade, leading to incorrect predictions.

One way to monitor data drift is to track the distribution of the input data and compare it to the data used to train the model. If the distributions differ significantly, it may be necessary to retrain the ML model.

Detecting data drift in the WhyLabs platform

Learn more about how to detect data drift with whylogs, our open-source data and ML monitoring library, or in WhyLabs.

Monitoring models for concept drift and performance

Concept drift can occur when the performance of an ML model decreases over time, even though there may not be significant data drift.

To monitor for concept drift, you can compare the model's predictions to actual outcomes, such as sales or customer satisfaction scores. If the model's predictions deviate from actual results, it may be necessary to retrain the model.

If you don’t have ground truth data for comparison you can try using performance estimation.

Monitoring ML performance metrics overtime

Learn how to monitor ML performance metrics in WhyLabs.

Monitoring ML pipelines for data quality

Bad data can occur from errors in data collection, sensor malfunction, or any number of pipeline bugs. Data quality can have a significant impact on the performance of ML models.

One way to monitor for bad data is to validate that the data is in the expected format and range using a set of defined parameters, such as the data should always be a numerical value above 0.

Created data quality validation tests with constraints in whylogs

Learn how to perform data quality validation for ML monitoring with whylogs.

Monitoring ML models for bias and fairness

Bias can occur when an ML model is trained on a dataset not representative of the population it is being used to predict.

To monitor for model bias in production data, you can examine how the model behaves on a specific segment or demographic.

Using WhyLabs to inspect model performance metrics for bias

Learn more about detecting bias and fairness with performance tracing in WhyLabs.

Monitor AI Explainability

AI explainability methods can help you understand why complex machine learning models are making predictions. One way to monitor the explainability of ML models is using libraries like SHAP to extract global feature importance of models.

These values can be logged and used in combination with the other metrics to obtain deep insights into model behavior.

Using ML explainability values to inspect input data by feature importance

Learn how to monitor global feature importance in WhyLabs.

Key takeaways for ML monitoring

Monitoring ML models in production is essential to ensure they continue to meet the expected results. By monitoring for data drift, model drift, data quality, bias, and explainability, businesses can identify issues and take action to maintain the accuracy and performance of their ML models. Implementing a robust monitoring system can help businesses to optimize their operations, reduce costs, and mitigate risks, ultimately leading to better outcomes for both businesses and their customers.

If you’re looking to get started with data and ML monitoring, we’re here to help! Here are 5 ways to take the next step in your model monitoring journey!

Get started with whylogs - our open-source data logging and monitoring tool
Start using the WhyLabs AI observatory for free
Request a demo and consultation with a solutions engineer
Join an upcoming live event for more hands-on experience
Ask questions the Robust & Responsible AI Slack group

Sage Elliott

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Mind Your Models: 5 Ways to Implement ML Monitoring in Production

What is ML monitoring?

ML monitoring for data drift

Monitoring models for concept drift and performance

Monitoring ML pipelines for data quality

Monitoring ML models for bias and fairness

Monitor AI Explainability

Key takeaways for ML monitoring

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs