ML Monitoring in Under 5 Minutes
- ML Monitoring
- Data Logging
Nov 15, 2022
It only takes a few minutes and a few lines of code to monitor your ML models and data pipelines.
Data validation and ML model monitoring are foundational steps to building reliable pipelines and responsible machine learning applications.
In this short post, I will show you how to use an open source data logging library and an AI observatory platform to monitor common issues with your ML models, such as data drift, concept drift, data quality, and performance.
Data logging and ML monitoring setup
First, we’ll install whylogs, an open-source data logging library that captures key statistical properties of data. We’ll also include dependencies for writing to the WhyLabs AI observatory for ML monitoring.
pip install “whylogs[whylabs]”
Next, we’ll import the `whylogs`,`pandas`, and `os` libraries into our Python project. We’ll also create a dataframe of our dataset to profile.
import whylogs as why import pandas as pd import os # create dataframe with dataset dataset = pd.read_csv("https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv")
The data profiles created with whylogs can be used on their own for data validation and data drift visualization, but in this example, we’re going to write profiles to the WhyLabs Observatory to perform ML monitoring.
In order to write profiles to WhyLabs, we’ll create an account and grab our `Org-ID`, `Access token`, and `Project-ID` to set them as environment variables in our project.
# Set WhyLabs access keys os.environ["WHYLABS_DEFAULT_ORG_ID"] = 'YOURORGID' os.environ["WHYLABS_API_KEY"] = 'YOURACCESSTOKEN' os.environ["WHYLABS_DEFAULT_DATASET_ID"] = 'PROJECTID'
Create a free WhyLabs account here, no credit card required.
Create a new project and get the ID:
Create Project > Set up model > Create Project
Get organization ID and access token:
Menu > Settings > Access Tokens > Create Access Token
That’s it for setting up. We can now write data profiles to WhyLabs.
Write profiles to WhyLabs for ML monitoring
Once the access keys are set up, we can easily create a profile of your dataset and write it to WhyLabs. This allows us to monitor input data and model predictions with just a few lines of code!
# initial WhyLabs writer, Create whylogs profile, write profile to WhyLabs writer = WhyLabsWriter() profile= why.log(dataset) writer.write(file=profile.view())
Profiles can be created at any stage of a pipeline allowing you to monitor data at every step.
By default the time stamp will be the time of the profile upload, but it can be overwritten to log data from different collection times and backfill profiles.
You can see an example of writing and backfilling data in this notebook.
Once profiles are written to WhyLabs they can be inspected, compared, and monitored for data quality and data drift.
Now we can enable a pre-configured monitor with just one click (or create a custom one) to detect anomalies in our data profiles. This makes it easy to set up common monitoring tasks, such detecting data drift, data quality issues, and model performance.
Once a monitor is configured, it can be previewed while inspecting an input feature.
When anomalies are detected, notifications can be sent via email, Slack, or PagerDuty. Set notification preferences in Settings > Notifications & Digest Settings.
That’s it! We have gone through all the steps needed to ingest data from anywhere in ML pipelines and get notified if anomalies occur.
Separating model input and outputs
It can be useful to separate model inputs and outputs, especially if you have a lot of features in your input data. Any features with names that contain the word “output” will appear in the outputs tab.
Monitoring model performance metrics
So far we’ve seen how to monitor model input and output data, but we can also monitor performance metrics such as accuracy, precision, etc. by logging ground truth with our prediction results.
To log performance metrics for monitoring use `why.log_classification_metrics` or `why.log_regression_metrics` and pass in a dataframe containing ground truth our model output results.
results = why.log_classification_metrics( df, target_column = "ground_truth", prediction_column = "cls_output", score_column="prob_output" ) profile = results.profile() results.writer("whylabs").write()
Note: Make sure your project is configured as a classification or regression model in the settings.
Just like the input data, performance metrics get uploaded with the current timestamp unless overwritten. See an example of backfilling data for performance monitoring in the example notebooks below.
Again we can select a pre-configured monitor to detect any change in performance.
Recap on ML monitoring
We covered how to quickly set up data and ML monitoring solutions that can be used at any point in your pipeline! With the right tools, ML monitoring can only take a few minutes with a few lines of code.
Example notebooks mentioned in this post:
- Writing Profiles to WhyLabs
- Monitoring Classification Model Performance Metrics
- Monitoring Regression Model Performance Metrics
Ready to implement data & ML monitoring in your own applications?
Glassdoor Decreases Latency Overhead and Improves Data Monitoring with WhyLabs
Aug 17, 2023
- Machine Learning
Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs
Sep 11, 2023
- ML Monitoring
Ensuring AI Success in Healthcare: The Vital Role of ML Monitoring
Aug 10, 2023
- ML Monitoring
WhyLabs Recognized by CB Insights GenAI 50 among the Most Innovative Generative AI Startups
Aug 8, 2023
Hugging Face and LangKit: Your Solution for LLM Observability
Jul 26, 2023
Safeguarding and Monitoring Large Language Model (LLM) Applications
Jul 11, 2023