The open source standard for data logging

whylogs automatically creates statistical summaries of datasets, called profiles, which have similar properties to the logs produced by other software applications

descriptive, lightweight, mergable

Key DataOps and MLOps activities such as data quality validation, data and model monitoring rely on statistical properties of data under the hood. With whylogs these activities can be implemented faster, cheaper, and more reliably.

The WhyLabs team believes that data logging is a critical missing component in the production ML stack. We open-sourced whylogs so data and AI practitioners can build on top of this paradigm and expand it.

Today, whylogs integrates with a wide range of data and AI platforms. Our goal is to make logging a table-stakes feature of any ML lifecycle activity. Join the growing community to help us build the standard for data logging!

Data logging and profiling

whylogs are designed to be extremely flexible. The library can capture profiles from structured and unstructured data such as images, text, audio, bounding boxes, etc. In addition, the library supports custom metrics, log rotation, and tagging. whylogs can be deployed as a container or be invoked directly from various ML tools.

Unlike all open source data quality solutions, whylogs separates the activity of capturing profiles from the activity of acting upon them. This gives users a powerful and extendable foundation for a wide range of MLOps tools and processes.

chart icon

whylogs outputs statistical profiles, available in the following formats:

  • Protobuf - a lightweight and efficient binary format that maps one-to-one with the memory representation of a whylogs object
  • JSON - displays the protobuf data in JSON format
  • Flat - outputs multiple files with both CSV and JSON content to represent different views of the data, including histograms, upper bound, lower bound, and frequent values

To take advantage of whylogs features, we recommend always enabling the Protobuf format.

batch icon

Supports batch and streaming data

  • Batch Mode - whylogs processes a dataset in batches
  • Streaming mode - whylogs processes individual data points
whybot image
Integration circuits
Integration circuits
Integration circuits

How do I generate whylogs profiles?

First, install whylogs:

pip install whylogs

Then, start logging statistical properties of features, model inputs, and model outputs to enable explorative analysis, data unit testing, and monitoring.

Getting whylogs up-and-running is easy, simply follow one of the integration examples shown below.

Getting whylogs up-and-running is easy, simply follow one of the integration examples provided in the WhyLabs documentation.








import pandas as pd
import os
from import Session
from import WhyLabsWriter

os.environ["WHYLABS_API_KEY"] = "YOUR-API-KEY"

df = pd.read_csv("YOUR-DATASET.csv")

# Adding the WhyLabs Writer to utilize WhyLabs platform
writer = WhyLabsWriter()

session = Session(project="demo-project", pipeline="demo-pipeline", writers=[writer])

# Point to your local CSV if you have your own data
df = pd.read_csv("")
# Run whylogs on current data and upload to the WhyLabs Platform
# Note: 'datasetId' maps to 'model-id' in WhyLabs
with session.logger(tags={"datasetId": "model-1"}) as ylog:

Brainstorm ideas and share feedback with the whylogs community members on Slack!


What can I do with my whylogs profiles?

whylogs profiles can be used in a variety of ways. They can be viewed directly with the built-in Python profile viewer or a data visualization framework such as matplotlib or Plotly. They can sent to the WhyLabs Platform for monitoring and observability.

The more whylogs profiles you generate for a particular model or dataset, the more value they provide. Here’s a breakdown of what can be done with whylogs profiles, depending on how many you have:

whylogs branding
whylogs profile
Single profile
whylogs profile
Two profiles
whylogs profile
Three or more
Data documentationorange checkmarkcyan checkmarkpurple checkmark
Exploratory data analysisorange checkmarkcyan checkmarkpurple checkmark
Data unit testingcyan checkmarkpurple checkmark
Ad-hoc comparison to baselinecyan checkmarkpurple checkmark
Continuous monitoringpurple checkmark

whylogs profiles are...

detailed icon

descriptive, the profiles capture all essential statistics needed to represent a dataset. The library enables users to capture statistics from both structured and unstructured data by offering default statistics per data type as well as flexibility to define custom statistics. By capturing the key statistical attributes of data, you can avoid having to store and process your entire datasets when you want to monitor them.

light icon

lightweight, which means that they can be stored and processed cheaply. The whylogs profiler consumes very little CPU and memory, it scales with the number of features being profiled, and it run alongside your data processing application. The design is suitable for the most resource-sensitive environments. Furthermore, the output is also lightweight and scales with the number of features being profiled.

merge icon

mergeable, which means that the profiles produced can be combined with other profiles. In a distributed system, profiles can be captured on every instance and merged together for a full view fo the data. In streaming systems, profiles can be captured over a mini-batch, and merged into hourly/daily/weekly snapshots of data without loosing statistical accuracy. This is made possible with a technique called data sketching, pioneered by Apache DataSketches.

Where do I find other whylogs users and get help?

Join the WhyLabs Community on Slack!

The WhyLabs Community is a forum for you to connect with other practitioners, share ideas, and learn about exciting new techniques.

pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Get started for free