WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Sage Elliott

Sep 28, 2022

Back to Blog

Data and ML Monitoring is Easier with whylogs v1.1

Whylogs
ML Monitoring
Open Source
Product Updates

Sage Elliott

Sep 28, 2022

whylogs v1.1 is out with new features that make data and ML monitoring easier than ever

The release brings many features to the whylogs data logging API, making it even easier to monitor your data and ML models!

whylogs is the open-source standard for data logging, allowing you to create statistical profiles of datasets to monitor for data quality, data drift, model drift, and more in Python or Java environments. Learn more about whylogs on GitHub.

Profiles generated with whylogs can also be used with WhyLabs Observatory to easily configure a customizable monitoring experience. Learn more about the WhyLabs Observatory here.

What's new with whylogs v1.1?

If you’re a longtime whylogs user, you may notice some of these features were already available in whylogs v0, and now they’re all available in the simplified v1 API.

New features in whylogs v1.1:

Segments: Gain visibility within a sub-group of data
Log image data: Monitor data for computer vision models
Log rotation: Monitor continuous data streams
Conditional count metrics: Detect specific values in datasets
String tracking: Monitor string data for NLP
Model performance: Track and monitor model performance in WhyLabs

Keep reading to learn more.

Monitor subgroups of data with segments

Specific subgroups of data can behave differently from the overall dataset. When monitoring the health of a dataset, it can be helpful to have visibility at a subgroup level to better understand how these subgroups contribute to trends in the overall dataset. This can be crucial for detecting dataset bias and fairness. whylogs v1.1 supports data segmentation for this purpose.

Segmentation in whylogs can be done by a single feature or by multiple features simultaneously.

from whylogs.core.segmentation_partition import segment_on_column
column_segments = segment_on_column("category")

See a full code example on GitHub

Segmented profiles can also be uploaded to WhyLabs, where each segment will appear in the “Segments” section of the model dashboard within a particular project.

Learn more about monitoring subgroups of data with segments in whylogs here.

Monitor Computer Vision data with image logging

In addition to tabular and textual data, whylogs can generate profiles of image data. whylogs can compute a number of metrics relative to image data. These metrics can be used to detect data drift and quality issues, such as low lighting levels.

results = log_image([img1, img2])
print(results.view().get_column("image_1").to_summary_dict())

Image metrics that are tracked in whylogs.

Brightness (mean, standard deviation)
Hue (mean, standard deviation)
Saturation (mean, standard deviation)
Image Pixel Height & Width
Colorspace (e.g. RBG, HSV)

Example of data quality issue with low lighting

To learn more about logging image data with whylogs, check out our documentation and stay tuned for an upcoming blog post about it!

Log rotation (rolling logs) for continuous data streams

Logging continuous streams of data can be challenging. By using log rotation in whylogs, you can ingest data at the rate it gets generated, without having any delay or memory constraints.

Instead of having to plan out how to log intervals with batching, whylogs will handle all of that for you. The Logger will create a session and log information at the requested intervals of seconds, minutes, hours, or days and at that interval, write out your profile to a .bin file and flush the log, getting ready to receive more data.

class MyApp:
def __init__(self):
# example of the rolilng logger at a 15 min interval
self.logger = why.logger(mode="rolling", interval=15, when="M",
base_name="message_profile_")
# write to our local path, there are other writers though
self.logger.append_writer("local", base_dir="example_output")
self.dataset_logged=0 # this is simple for our logging
def close(self):
# On exit the rest of the logging will be saved
self.logger.close()
def consume(self, data_df):
self.logger.log(data_df) # log it into our data set profile
self.dataset_logged += 1
print("Inputs Processed: " + str(app.dataset_logged) +
"    Dataset Files Written to Local: " + str(count_files(tmp_path)))

See a full code example on GitHub

Learn more about log rotation to monitor data streams here.

Conditional count metrics

By default, whylogs tracks several metrics, such as type counts, distribution metrics, cardinality, and frequent items. While these metrics are helpful for many use cases, such as monitoring data drift, sometimes custom metrics are needed to monitor an application properly.

Condition count metrics allow users to define custom metrics and return the number of times the condition was valid for a given column. This feature is useful for detecting personal identifiable information (PII) or if specific numerical values are contained in datasets.

Users can create condition count metrics with regex for string matching, conditionals for numerical values, or a custom function for any given condition.

class CustomResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {"condition_count": ConditionCountMetric.zero(column_schema.cfg)}
conditions = {
"containsEmail": Condition(rel(Rel.fullmatch, "[\w.]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}")),
"containsCreditCard": Condition(rel(Rel.match, ".*4[0-9]{12}(?:[0-9]{3})?"))
}
config = ConditionCountConfig(conditions=conditions)
resolver = CustomResolver()
schema = DatasetSchema(default_configs=config, resolvers=resolver)
prof_view = why.log(df, schema=schema).profile().view()
prof_view.to_pandas()

See a full code example on GitHub

Condition Validators can be used with these metrics to trigger actions.

Learn more about using condition count metrics in whylogs:

Basic string tracking

String tracking allows users to use whylogs to perform essential text monitoring functions on datasets. By default, columns of type str will have the following metrics, when logged with whylogs: - Counts - Types - Frequent Items/Frequent Strings - Cardinality.

Tracking further metrics for strings can be done by counting the number of characters that fall in a given unicode range for each string record, and then generating distribution metrics, such as mean, stddev and quantile values based on these counts. In addition to specific unicode ranges, whylogs can follow the same approach, but for the overall string length.

Some examples could include detecting if a communication style is changing, different languages, and how many emojis are used.

The example below tracks two specific ranges of characters:

ASCII Digits (unicode range 48-57)
Latin alphabet (unicode range 97-122)

class UnicodeResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {UnicodeRangeMetric.get_namespace(column_schema.cfg): UnicodeRangeMetric.zero(column_schema.cfg)}
config = MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})
schema = DatasetSchema(resolvers=UnicodeResolver(), default_configs=config)
prof_results = why.log(df, schema=DatasetSchema(resolvers=UnicodeResolver(),
default_configs=MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})))
prof = prof_results.profile()
profile_view_df = prof.view().to_pandas()
profile_view_df

See a full code example on GitHub

Learn more about string tacking with whylogs here.

NOTE: More text and NLP logging features are coming to whylogs soon!

Model performance monitoring

Monitoring model performance is critical to understanding how well ML models continue to function once deployed. Performance is tracked by logging model predictions and ground truth data with whylogs to calculate scoring metrics in your home-grown ML monitoring solution or the WhyLabs Observability.

Users can set custom monitors in WhyLabs to detect anomalies in model performance, such as if the model accuracy score drops.

WhyLabs will calculate scoring metrics for both classification and regression models.

Classification metrics: Total output and input count, accuracy, ROC, precision-recall chart, confusion matrix, recall, FPR, precision, and F1 score.

results = why.log_classification_metrics(
df,
target_column = "output_discount",
prediction_column = "output_prediction",
score_column="output_score"
)

See a full code example on GitHub

Regression metrics: Total output and input count, mean squared error, mean absolute error, root mean squared error.

results = why.log_regression_metrics(
df,
target_column = "temperature",
prediction_column = "prediction_temperature"
)

See a full code example on GitHub

Get started with monitoring model performance:

Conclusion

We’re excited about the functionality whylogs v1.1 brings, allowing users to monitor model performance, subgroups, images, strings, and continuous data streams in our easy-to-use data logging API.

If you’re interested in trying whylogs or getting involved with our community of AI builders, here are some steps you can take:

Check out the whylogs GitHub repository (don’t forget to give us a ⭐)
Try out the Example Notebooks
Join the Robust & Responsible AI Community Slack workspace

Sage Elliott

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Data and ML Monitoring is Easier with whylogs v1.1

whylogs v1.1 is out with new features that make data and ML monitoring easier than ever

What's new with whylogs v1.1?

Monitor subgroups of data with segments

Monitor Computer Vision data with image logging

Log rotation (rolling logs) for continuous data streams

Conditional count metrics

Basic string tracking

Model performance monitoring

Conclusion

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs