blog bg left
Back to Blog

Streamlining data monitoring with whylogs and MLflow

Everyone who has worked with machine learning models in production is familiar with their complexity of deployment and lifecycle management. The task becomes particularly grueling given the current fragmented tooling ecosystem.

Model lifecycle chart. Source: Google

There is a smörgåsbord of tools supporting each individual stage of this cycle, which forces data science teams to build custom pipelines for the various ML frameworks (or even individual algorithms) that they use. Something as inconsequential as training and deploying a simple Scikit Learn linear regression model to SageMaker can take weeks or months to productionize, if your team doesn’t have a pipeline already built for these specific frameworks.

In a world where it’s no longer enough to train a model once, deploy, and forget about it, organizations are facing ever-mounting challenges with developing in-house MLops frameworks and processes.

Enter MLflow — an open source framework created by Databricks to unify the model lifecycle management, used by Facebook, Zillow, Microsoft, and a host of other AI-first companies. MLflow seeks to directly address the key problems of model tracking and interoperability between different ML tools. MLflow works with any ML library, framework, or language, removing barriers to rapid prototyping and leading to quicker turnaround times for solving business problems in production.

For the perspicacious ML engineer, MLflow provides an intuitive and straightforward approach to model deployment while solving many of the common problems, such as tracking model metadata and persisting models in a registry, within the framework itself. Instead of building complicated in-house infrastructure for keeping track of your models (and their performance), MLflow provides simple, powerful tools to manage your models from inception to serving happy customers.

Monitoring in MLflow

One of the key features in MLflow is the ability to capture detailed metrics for your models. The framework is not opinionated about what you should log. Instead it provides a simple API for recording whatever data you might find useful. This is accomplished via calls to mlflow.log_metrics in your MLflow runs, and you can find additional examples and documentation here.

These metrics can later be visualized via the MLflow server interface, which is super handy for tracking model metrics across different iterations of a model, or over time.

MLflow metrics visualization. Source: MLflow

Finally, MLflow has autologging integrations with all the commonly used ML frameworks, providing a straightforward method to logging performance metrics.

However, the logging solutions native to these frameworks generally focus on model performance itself, such as its accuracy and loss, and do not adequately capture information about the context and environment in which your models operate. Figuring out why your models may be underperforming is a nigh impossible task if you don’t capture this context.

Monitoring Data Quality with whylogs

Luckily, MLflow makes it easy to add integration with third party libraries, so that various additional metrics can be collected both during the training process and once the model is live. By integrating whylogs into the MLflow runtime, we can add data quality monitoring to the model pipeline. whylogs is an open source, lightweight, and high performance statistical data logging library that enables a fire-and-forget approach to logging data quality by profiling the data during training and as it flows through the model once it has been deployed.

Profiling data with whylogs allows engineers and data scientists to catch data quality issues during training as well as detect data drift after deployment, which ultimately enables a more informed analysis of the model’s performance over time. Rapid response to issues in production is also made possible, as data quality degradation can be uncovered in near real-time.

Why use whylogs over an in-house solution or another library? Great question!

  • whylogs is entirely open source. No hidden rocks, undocumented interactions, or unsolvable data governance concerns.
  • It profiles data in an extremely efficient manner, with a constant memory footprint and low CPU overhead, letting it easily scale from megabytes to terabytes of incoming data. Save those GPU cycles for your models!
whylogs Java performance metrics. Source: whylogs Java
  • It works with both structured and unstructured data. The general approach can be applied to any type of data. Profile your images with just two lines of code:
with whylogs.get_or_create_session() as session:

See the full image logging notebook for more information.

  • whylogs is platform-agnostic. Use it with MLflow, SageMaker, and on your Spark Pipelines — the more you log, the more transparency you enable, the more proactive you are about catching model failures and preventing their costs from accumulating. Profile your Spark data in just six lines of code:
val df ="fire_dept.csv")

val profiles = df


Using whylogs with MLflow

The whylogs library seamlessly integrates with MLflow by patching its runtime:

import mlflow
import whylogs


After enabling the integration, whylogs can be used to log data metrics when running MLflow jobs:

with mlflow.start_run(run_name=”whylogs demo”):
  predicted_output = model.predict(batch)

  mae = mean_absolute_error(actuals, predicted_output)

  mlflow.log_metric("mae", mae)

  # whylogs profiles are collected in one line,
  # similar to other MLflow Tracking APIs

Once whylogs profiles have been generated, they are stored by MLflow along with all the other artifacts from the run. They can be retrieved from the MLflow backend and explored further:

from whylogs.viz import ProfileVisualizer

mlflow_profiles = whylogs.mlflow.get_experiment_profiles(“experiment_1”)
viz = ProfileVisualizer()
viz.plot_distribution("free sulfur dioxide", ts_format="%d-%b-%y %H:%M:%S")
Distribution plot for one of the columns in the model input, collected at inference time

For a more complete (and hands-on!) overview of the whylogs integration with MLflow, check out our notebook.

Other posts

Choosing the Right Data Quality Monitoring Solution

In the second article in this series, we break down what to look for in a data quality monitoring solution, open source and Saas tools available, and how to decide on the best one for your organization.

Deploying and Monitoring Made Easy with TeachableHub and WhyLabs

Deploying a model into production and maintaining its performance can be harrowing for many Data Scientists, especially without specialized expertise and equipment. Fortunately, TeachableHub and WhyLabs make it easy to get models out of the sandbox and into a production-ready environment.

A Comprehensive Overview Of Data Quality Monitoring

In the first article in this series, we provide a detailed overview of why data quality monitoring is crucial for building successful data and machine learning systems and how to approach it.

WhyLabs Now Available in AWS Marketplace

AWS customers worldwide can now quickly deploy the WhyLabs AI Observatory to monitor, understand, and debug their machine learning models deployed in AWS.

How Observability Uncovers the Effects of ML Technical Debt

Many teams test their machine learning models offline but conduct little to no online evaluation after initial deployment. These teams are flying blind—running production systems with no insight into their ongoing performance.

Deploy your ML model with UbiOps and monitor it with WhyLabs

Machine learning models can only provide value for a business when they are brought out of the sandbox and into the real world... Fortunately, UbiOps and WhyLabs have partnered together to make deploying and monitoring machine learning models easy.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Get started for free