WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

WhyLabs Admin

Feb 8, 2021

Back to Blog

Streamlining data monitoring with whylogs and MLflow

Whylogs
Open Source
Integrations
Data Quality
ML Monitoring

WhyLabs Admin

Feb 8, 2021

Everyone who has worked with machine learning models in production is familiar with their complexity of deployment and lifecycle management, from storing trained models to Model Deployment and Data Monitoring. The task becomes particularly grueling given the current fragmented tooling ecosystem.

Enter MLflow — an open source framework created by Databricks to unify the model lifecycle management, used by Facebook, Zillow, Microsoft, and a host of other AI-first companies. MLflow seeks to directly address the key problems of model tracking and interoperability between different ML tools. MLflow works with any ML library, framework, or language, removing barriers to rapid prototyping and leading to quicker turnaround times for solving business problems in production.

For the perspicacious ML engineer, MLflow provides an intuitive and straightforward approach to model deployment while solving many of the common problems, such as tracking model metadata and persisting models in a registry, within the framework itself. Instead of building complicated in-house infrastructure for keeping track of your models (and their performance), MLflow provides simple, powerful tools to manage your models from inception to serving happy customers.

Model monitoring in MLflow

One of the key features in MLflow is the ability to capture detailed metrics for your models. The framework is not opinionated about what you should log. Instead it provides a simple API for recording whatever data you might find useful. This is accomplished via calls to mlflow.log_metrics in your MLflow runs, and you can find additional examples and documentation here.

These metrics can later be visualized via the MLflow server interface, which is super handy for tracking model metrics across different iterations of a model, or over time.

MLflow metrics visualization. Source: MLflow

Finally, MLflow has autologging integrations with all the commonly used ML frameworks, providing a straightforward method to logging performance metrics.

However, the logging solutions native to these frameworks generally focus on model performance itself, such as its precision, recall, and also hyperparameters used to reach those numbers, and do not adequately capture information about the context and environment in which your models operate. Figuring out why your models may be underperforming is a nigh impossible task if you don’t capture this context.

Data quality monitoring with whylogs

Luckily, MLflow makes it easy to add integration with third party libraries, so that various additional metrics can be collected both during the training process and once the model is live. By integrating whylogs into the MLflow runtime, we can add data quality monitoring to the model pipeline. whylogs is an open source, lightweight, and high performance statistical data logging library that enables a fire-and-forget approach to logging data quality by profiling the data during training and as it flows through the model once it has been deployed.

Profiling data with whylogs allows engineers and data scientists to catch data quality issues during training as well as detect data drift after deployment, which ultimately enables a more informed analysis of the model’s performance over time. Rapid response to issues in production is also made possible, as data quality degradation can be uncovered in near real-time.

Why use whylogs over an in-house solution or another library? Great question!

whylogs is entirely open source. No hidden rocks, undocumented interactions, or unsolvable data governance concerns.
It profiles data in an extremely efficient manner, with a constant memory footprint and low CPU overhead, letting it easily scale from megabytes to terabytes of incoming data. Save those GPU cycles for your models!

whylogs Java performance metrics. Source: whylogs Java

It works with both structured and unstructured data. The general approach can be applied to any type of data. To profile data, simply add this line of code to your existing training pipeline:

why.log(training_df)

See the full image logging notebook for more information.

whylogs is platform-agnostic. Use it with MLflow, SageMaker, and on your Spark Pipelines — the more you log, the more transparency you enable, the more proactive you are about catching model failures and preventing their costs from accumulating. To profile your pyspark job, you can use our experimental API like the following code shows:

from whylogs.api.experimental.pyspark import collect_dataset_profile_view
 
spark_df = spark.read.csv("fire_dept.csv")
 
profile = collect_dataset_profile_view(df)

Data logging with whylogs with MLflow

whylogs can be used to log data metrics when running MLflow jobs:

with mlflow.start_run(run_name=”whylogs demo”):
predicted_output = model.predict(batch)
 
mae = mean_absolute_error(actuals, predicted_output)
 
mlflow.log_params(model_params)
mlflow.log_metric("mae", mae)
 
# whylogs profiles are collected in one line,
# similar to other MLflow Tracking APIs
profile_results = why.log(df)
profile_results.writer("mlflow").write()

Once whylogs profiles have been generated, they are stored by MLflow along with all the other artifacts from the run. They can be retrieved from the MLflow backend and explored further:

from whylogs.viz import NotebookProfileVisualizer
 
profile = client.download_artifacts(run_id, "profile.bin", local_dir)
 
viz = NotebookProfileVisualizer()
viz.set_profiles(target_profile=profile)
viz.profile_summary()

Image: Distribution plot for the reference profile we have created when training our model.

For a more complete (and hands-on!) overview of the whylogs integration with MLflow, check out our example notebook.

Resources

whylogs github
MLflow integration documentation for whylogs
WhyLabs free sign-up
Towards Data Science: Sampling isn’t enough, profile your ML data instead
whylogs Python
whylogs Java Join the Rsqrd AI Slack community to discuss ideas and share feedback on data logging and AI observability

WhyLabs Admin

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Streamlining data monitoring with whylogs and MLflow

Model monitoring in MLflow

Data quality monitoring with whylogs

Data logging with whylogs with MLflow

Resources

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs