Show your love for whylogs with a star!

The open standard for data logging

Get started in seconds:

pip install whylogs

Then, run this code:

import whylogs as whyresults = why.log(pandas_df)

Show your love for whylogs with a star!

Available in Python and Java

LicensePyPi Downloadswhylogs CImaintainability

whylogs lets you:

Track data for ML experiments

Enable data auditing and governance

Detect data drift and resultant ML model performance degradation

Validate data quality

Perform exploratory data analysis

whylogs is an open source library for any kind of data logging.

With whylogs, you are able to generate summaries of your datasets, called whylogs profiles.

“ML engineers need better tools to ensure high-quality data through all stages of an ML project’s lifecycle… [whylogs] makes it easy for developers to maintain real time logs and monitor ML deployments.”

Andrew Ng
Andrew Ng

Andrew Ng

Founder and CEO of Landing AI; Founder of DeepLearning.AI

descriptive, lightweight, mergeable

profiles are...

light icon


whylogs profiles efficiently describe the dataset that they represent. This high fidelity representation of datasets is what enables whylogs profiles to be effective snapshots of the data. They are better at capturing the characteristics of a dataset than a sample would be—as discussed in our Data Logging: Sampling versus Profiling blog post—and are very compact.

light icon


The statistics that whylogs profiles collect are easily configurable and customizable. This is important because different data types and use cases require different metrics, and whylogs users need to be able to easily define custom trackers for those metrics. It’s the customizability of whylogs that enables our text, image, and other complex data trackers.

merge icon


One of the most powerful features of whylogs profiles is their mergeability. Mergeability means that whylogs profiles can be combined together to form new profiles which represent the aggregate of their constituent profiles. This enables logging for distributed and streaming systems, and allows users to view aggregated data across any time granularity.

whylogs can be run in Python or Apache Spark environments—both PySpark and Scala—on a variety of data types.

We integrate with lots of other tools including Pandas, AWS Sagemaker, MLflow, Flask, Ray, RAPIDS, Apache Kafka, and more.

integration Graphic

Data logging and profiling

whylogs are designed to be extremely flexible. The library can capture profiles from structured and unstructured data such as images, text, audio, bounding boxes, etc. In addition, the library supports custom metrics, log rotation, and tagging. whylogs can be deployed as a container or be invoked directly from various ML tools.

Unlike all open source data quality solutions, whylogs separates the activity of capturing profiles from the activity of acting upon them. This gives users a powerful and extendable foundation for a wide range of MLOps tools and processes.

chart icon

whylogs outputs statistical profiles, available in the following formats:

  • Protobuf - a lightweight and efficient binary format that maps one-to-one with the memory representation of a whylogs object
  • JSON - displays the protobuf data in JSON format
  • Flat - outputs multiple files with both CSV and JSON content to represent different views of the data, including histograms, upper bound, lower bound, and frequent values

To take advantage of whylogs features, we recommend always enabling the Protobuf format.

batch icon

Supports batch and streaming data

  • Batch Mode - whylogs processes a dataset in batches
  • Streaming mode - whylogs processes individual data points
Integration circuits
Integration circuits
Integration circuits

How do I generate whylogs profiles?

First, install whylogs:

pip install 'whylogs[whylabs]'

Then, start logging statistical properties of features, model inputs, and model outputs to enable explorative analysis, data unit testing, and monitoring.

Getting whylogs up-and-running is easy, simply follow one of the integration examples shown below.

Getting whylogs up-and-running is easy, simply follow one of the integration examples provided in the WhyLabs documentation.

whylogs Integration




### First, install whylogs with the whylabs extra
### pip install -q 'whylogs[whylabs]'

import pandas as pd
import os
import whylogs as why

os.environ["WHYLABS_API_KEY"] = "YOUR-API-KEY"
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = "model-1" # Note: the 'model-id' is provided when setting-up a model in WhyLabs

# Point to your local CSV if you have your own data
df = pd.read_csv("")
# Run whylogs on current data and upload to the WhyLabs Platform
results = why.log(df)

Brainstorm ideas and share feedback with the whylogs community members on Slack!


What can I do with my whylogs profiles?

whylogs profiles can be used in a variety of ways. They can be viewed directly with the built-in Python profile viewer or a data visualization framework such as matplotlib or Plotly. They can sent to the WhyLabs Platform for monitoring and observability.

The more whylogs profiles you generate for a particular model or dataset, the more value they provide. Here’s a breakdown of what can be done with whylogs profiles, depending on how many you have:

whylogs profile
Single profile
whylogs profile
Two profiles
whylogs profile
Three or more
Data documentationorange checkmarkcyan checkmarkpurple checkmark
Exploratory data analysisorange checkmarkcyan checkmarkpurple checkmark
Data unit testingcyan checkmarkpurple checkmark
Ad-hoc comparison to baselinecyan checkmarkpurple checkmark
Continuous monitoringpurple checkmark

Where do I find other whylogs users and get help?

Join the WhyLabs Community on Slack!

The WhyLabs Community is a forum for you to connect with other practitioners, share ideas, and learn about exciting new techniques.

pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo