WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Show your love for whylogs with a star!

Star

The open standard for data logging

Get started in seconds:

pip install whylogs

Then, run this code:

import whylogs as whyresults = why.log(pandas_df)

Show your love for whylogs with a star!

Star

Available in Python and Java

whylogs lets you:

Track data for ML experiments

Enable data auditing and governance

Detect data drift and resultant ML model performance degradation

Validate data quality

Perform exploratory data analysis

whylogs is an open source library for any kind of data logging.

With whylogs, you are able to generate summaries of your datasets, called whylogs profiles.

“ML engineers need better tools to ensure high-quality data through all stages of an ML project’s lifecycle… [whylogs] makes it easy for developers to maintain real time logs and monitor ML deployments.”

Andrew Ng

Founder and CEO of Landing AI; Founder of DeepLearning.AI

profiles are...

Efficient

whylogs profiles efficiently describe the dataset that they represent. This high fidelity representation of datasets is what enables whylogs profiles to be effective snapshots of the data. They are better at capturing the characteristics of a dataset than a sample would be—as discussed in our Data Logging: Sampling versus Profiling blog post—and are very compact.

Customizable

The statistics that whylogs profiles collect are easily configurable and customizable. This is important because different data types and use cases require different metrics, and whylogs users need to be able to easily define custom trackers for those metrics. It’s the customizability of whylogs that enables our text, image, and other complex data trackers.

Mergeable

One of the most powerful features of whylogs profiles is their mergeability. Mergeability means that whylogs profiles can be combined together to form new profiles which represent the aggregate of their constituent profiles. This enables logging for distributed and streaming systems, and allows users to view aggregated data across any time granularity.

whylogs can be run in Python or Apache Spark environments—both PySpark and Scala—on a variety of data types.

We integrate with lots of other tools including Pandas, AWS Sagemaker, MLflow, Flask, Ray, RAPIDS, Apache Kafka, and more.

Data logging and profiling

whylogs are designed to be extremely flexible. The library can capture profiles from structured and unstructured data such as images, text, audio, bounding boxes, etc. In addition, the library supports custom metrics, log rotation, and tagging. whylogs can be deployed as a container or be invoked directly from various ML tools.

Unlike all open source data quality solutions, whylogs separates the activity of capturing profiles from the activity of acting upon them. This gives users a powerful and extendable foundation for a wide range of MLOps tools and processes.

whylogs outputs statistical profiles, available in the following formats:

Protobuf - a lightweight and efficient binary format that maps one-to-one with the memory representation of a whylogs object
JSON - displays the protobuf data in JSON format
Flat - outputs multiple files with both CSV and JSON content to represent different views of the data, including histograms, upper bound, lower bound, and frequent values

To take advantage of whylogs features, we recommend always enabling the Protobuf format.

Supports batch and streaming data

Batch Mode - whylogs processes a dataset in batches
Streaming mode - whylogs processes individual data points

How do I generate whylogs profiles?

First, install whylogs:

pip install 'whylogs[whylabs]'

Then, start logging statistical properties of features, model inputs, and model outputs to enable explorative analysis, data unit testing, and monitoring.

Getting whylogs up-and-running is easy, simply follow one of the integration examples shown below.

Getting whylogs up-and-running is easy, simply follow one of the integration examples provided in the WhyLabs documentation.

whylogs Integration

PYTHON

flask

sagemaker

### First, install whylogs with the whylabs extra
### pip install -q 'whylogs[whylabs]'

import pandas as pd
import os
import whylogs as why

os.environ["WHYLABS_API_KEY"] = "YOUR-API-KEY"
os.environ["WHYLABS_DEFAULT_ORG_ID"] = "YOUR-ORG-ID"
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = "model-1" # Note: the 'model-id' is provided when setting-up a model in WhyLabs

# Point to your local CSV if you have your own data
df = pd.read_csv("https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv")
                
# Run whylogs on current data and upload to the WhyLabs Platform
results = why.log(df)
results.writer("whylabs").write()

Brainstorm ideas and share feedback with the whylogs community members on Slack!

JOIN THE COMMUNITY

What can I do with my whylogs profiles?

whylogs profiles can be used in a variety of ways. They can be viewed directly with the built-in Python profile viewer or a data visualization framework such as matplotlib or Plotly. They can sent to the WhyLabs Platform for monitoring and observability.

The more whylogs profiles you generate for a particular model or dataset, the more value they provide. Here’s a breakdown of what can be done with whylogs profiles, depending on how many you have: