blog bg left
Back to Blog

Data and ML Monitoring is Easier with whylogs v1.1

whylogs v1.1 is out with new features that make data and ML monitoring easier than ever

The release brings many features to the whylogs data logging API, making it even easier to monitor your data and ML models!

whylogs is the open-source standard for data logging, allowing you to create statistical profiles of datasets to monitor for data quality, data drift, model drift, and more in Python or Java environments. Learn more about whylogs on GitHub.

Profiles generated with whylogs can also be used with WhyLabs Observatory to easily configure a customizable monitoring experience. Learn more about the WhyLabs Observatory here.

What's new with whylogs v1.1?

If you’re a longtime whylogs user, you may notice some of these features were already available in whylogs v0, and now they’re all available in the simplified v1 API.

New features in whylogs v1.1:

  • Segments: Gain visibility within a sub-group of data
  • Log image data: Monitor data for computer vision models
  • Log rotation: Monitor continuous data streams
  • Conditional count metrics: Detect specific values in datasets
  • String tracking: Monitor string data for NLP
  • Model performance: Track and monitor model performance in WhyLabs

Keep reading to learn more.

Monitor subgroups of data with segments

Specific subgroups of data can behave differently from the overall dataset. When monitoring the health of a dataset, it can be helpful to have visibility at a subgroup level to better understand how these subgroups contribute to trends in the overall dataset. This can be crucial for detecting dataset bias and fairness. whylogs v1.1 supports data segmentation for this purpose.

Segmentation in whylogs can be done by a single feature or by multiple features simultaneously.

from whylogs.core.segmentation_partition import segment_on_column
column_segments = segment_on_column("category")
See a full code example on GitHub

Segmented profiles can also be uploaded to WhyLabs, where each segment will appear in the “Segments” section of the model dashboard within a particular project.

Example of segments in WhyLabs

Learn more about monitoring subgroups of data with segments in whylogs here.

Monitor Computer Vision data with image logging

In addition to tabular and textual data, whylogs can generate profiles of image data. whylogs can compute a number of metrics relative to image data. These metrics can be used to detect data drift and quality issues, such as low lighting levels.

results = log_image([img1, img2])

Image metrics that are tracked in whylogs.

  • Brightness (mean, standard deviation)
  • Hue (mean, standard deviation)
  • Saturation (mean, standard deviation)
  • Image Pixel Height & Width
  • Colorspace (e.g. RBG, HSV)
Example of data quality issue with low lighting

To learn more about logging image data with whylogs, check out our documentation  and stay tuned for an upcoming blog post about it!

Log rotation (rolling logs) for continuous data streams

Logging continuous streams of data can be challenging. By using log rotation in whylogs, you can ingest data at the rate it gets generated, without having any delay or memory constraints.

Instead of having to plan out how to log intervals with batching, whylogs will handle all of that for you. The Logger will create a session and log information at the requested intervals of seconds, minutes, hours, or days and at that interval, write out your profile to a .bin file and flush the log, getting ready to receive more data.

class MyApp:
def __init__(self):
# example of the rolilng logger at a 15 min interval
self.logger = why.logger(mode="rolling", interval=15, when="M",
# write to our local path, there are other writers though
self.logger.append_writer("local", base_dir="example_output")
self.dataset_logged=0 # this is simple for our logging
def close(self):
# On exit the rest of the logging will be saved
def consume(self, data_df):
self.logger.log(data_df) # log it into our data set profile
self.dataset_logged += 1
print("Inputs Processed: " + str(app.dataset_logged) +
"    Dataset Files Written to Local: " + str(count_files(tmp_path)))
See a full code example on GitHub

Learn more about log rotation to monitor data streams here.

Conditional count metrics

By default, whylogs tracks several metrics, such as type counts, distribution metrics, cardinality, and frequent items. While these metrics are helpful for many use cases, such as monitoring data drift, sometimes custom metrics are needed to monitor an application properly.

Condition count metrics allow users to define custom metrics and return the number of times the condition was valid for a given column. This feature is useful for detecting personal identifiable information (PII) or if specific numerical values are contained in datasets.

Users can create condition count metrics with regex for string matching, conditionals for numerical values, or a custom function for any given condition.

class CustomResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {"condition_count":}
conditions = {
"containsEmail": Condition(rel(Rel.fullmatch, "[\w.]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}")),
"containsCreditCard": Condition(rel(Rel.match, ".*4[0-9]{12}(?:[0-9]{3})?"))
config = ConditionCountConfig(conditions=conditions)
resolver = CustomResolver()
schema = DatasetSchema(default_configs=config, resolvers=resolver)
prof_view = why.log(df, schema=schema).profile().view()
See a full code example on GitHub

Condition Validators can be used with these metrics to trigger actions.

Learn more about using condition count metrics in whylogs:

Basic string tracking

String tracking allows users to use whylogs to perform essential text monitoring functions on datasets. By default, columns of type str will have the following metrics, when logged with whylogs: - Counts - Types - Frequent Items/Frequent Strings - Cardinality.

Tracking further metrics for strings can be done by counting the number of characters that fall in a given unicode range for each string record, and then generating distribution metrics, such as mean, stddev and quantile values based on these counts. In addition to specific unicode ranges, whylogs can follow the same approach, but for the overall string length.

Some examples could include detecting if a communication style is changing, different languages, and how many emojis are used.

The example below tracks two specific ranges of characters:

  • ASCII Digits (unicode range 48-57)
  • Latin alphabet (unicode range 97-122)
class UnicodeResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {UnicodeRangeMetric.get_namespace(column_schema.cfg):}
config = MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})
schema = DatasetSchema(resolvers=UnicodeResolver(), default_configs=config)
prof_results = why.log(df, schema=DatasetSchema(resolvers=UnicodeResolver(),
default_configs=MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})))
prof = prof_results.profile()
profile_view_df = prof.view().to_pandas()
See a full code example on GitHub

Learn more about string tacking with whylogs here.

NOTE: More text and NLP logging features are coming to whylogs soon!

Model performance monitoring

Monitoring model performance is critical to understanding how well ML models continue to function once deployed. Performance is tracked by logging model predictions and ground truth data with whylogs to calculate scoring metrics in your home-grown ML monitoring solution or the WhyLabs Observability.

Users can set custom monitors in WhyLabs to detect anomalies in model performance, such as if the model accuracy score drops.

WhyLabs will calculate scoring metrics for both classification and regression models.

Classification metrics: Total output and input count, accuracy, ROC, precision-recall chart, confusion matrix, recall, FPR, precision, and F1 score.

results = why.log_classification_metrics(
target_column = "output_discount",
prediction_column = "output_prediction",
See a full code example on GitHub

Regression metrics: Total output and input count, mean squared error, mean absolute error, root mean squared error.

results = why.log_regression_metrics(
target_column = "temperature",
prediction_column = "prediction_temperature"
See a full code example on GitHub

Get started with monitoring model performance:


We’re excited about the functionality whylogs v1.1 brings, allowing users to monitor model performance, subgroups, images, strings, and continuous data streams in our easy-to-use data logging API.

If you’re interested in trying whylogs or getting involved with our community of AI builders, here are some steps you can take:

Other posts

Get Early Access to the First Purpose-Built Monitoring Solution for LLMs

We’re excited to announce our private beta release of LangKit, the first purpose-built large language model monitoring solution! Join the responsible LLM revolution by signing up for early access.

Mind Your Models: 5 Ways to Implement ML Monitoring in Production

We’ve outlined five easy ways to monitor your ML models in production to ensure they are robust and responsible by monitoring for concept drift, data drift, data quality, AI explainability and more.

Simplifying ML Deployment: A Conversation with BentoML's Founder & CEO Chaoyu Yang

A summary of the live interview with Chaoyu Yang, Founder & CEO at BentoML, on putting machine learning models in production and BentoML's role in simplifying deployment.

Data Drift vs. Concept Drift and Why Monitoring for Them is Important

Data drift and concept drift are two common challenges that can impact ML models on production. In this blog, we'll explore the differences between these two types of drift and why monitoring for them is crucial.

Robust & Responsible AI Newsletter - Issue #5

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

Detecting Financial Fraud in Real-Time: A Guide to ML Monitoring

Fraud is a significant challenge for financial institutions and businesses. As fraudsters constantly adapt their tactics, it’s essential to implement a robust ML monitoring system to ensure that models effectively detect fraud and minimize false positives.

How to Troubleshoot Embeddings Without Eye-balling t-SNE or UMAP Plots

WhyLabs' scalable approach to monitoring high dimensional embeddings data means you don’t have to eye-ball pretty UMAP plots to troubleshoot embeddings!

Achieving Ethical AI with Model Performance Tracing and ML Explainability

With Model Performance Tracing and ML Explainability, we’ve accelerated our customers’ journey toward achieving the three goals of ethical AI - fairness, accountability and transparency.

Detecting and Fixing Data Drift in Computer Vision

In this tutorial, Magdalena Konkiewicz from Toloka focuses on the practical part of data drift detection and fixing it on a computer vision example.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo