blog bg left
Back to Blog

Data Labeling Meets Data Monitoring with Superb AI and WhyLabs


Data quality is the key to a performant machine learning model. Without high-quality data to train on, the model will be unable to represent the real-world processes that the data encapsulates accurately. And without high-quality data to feed into the model once it’s trained, the model’s predictions will be wildly inaccurate. That’s why WhyLabs and Superb AI are on a mission to ensure that data scientists and machine learning engineers have access to tools designed specifically for their needs and workflows. These tools enable them to generate high-quality data and monitor the quality of their data, so they can produce robust and reliable ML models.

In this blog post, we explain how WhyLabs and Superb AI’s complementary technologies fit together in a way that brings value to AI practitioners. After a brief overview of each platform, we dive into an example workflow that demonstrates how the two tools can be used in conjunction.

Superb AI Suite Platform

Superb AI has introduced a revolutionary way for ML teams to drastically decrease the time it takes to deliver high-quality training datasets. Instead of relying on human labelers for a majority of the data preparation workflow, teams can now implement a much more time and cost-efficient pipeline with the Superb AI platform.

Everything centers around Superb AI’s customizable auto-label (CAL) technology, which uses a unique mixture of transfer learning, few-shot learning, and autoML, allowing the model to achieve high levels of efficiency with small, customer-proprietary datasets quickly. The concept is quite simple: instead of having to create massive ground truth datasets by hand, teams can now build much smaller ground truth or “golden” sets, quickly spin up and train an auto-labeling model with just a few clicks and start labeling large datasets in a matter of minutes. Coupling the workflow with proprietary Uncertain Estimation AI and enterprise-level auditing tools, teams can label large datasets, immediately identify hard labels, build active learning workflows for auditing and deliver datasets in a matter of days.

WhyLabs AI Observability Platform

WhyLabs provides the critical missing component of AI observability in production ML systems by monitoring ML deployments. With the WhyLabs AI Observability Platform, every AI practitioner can switch on monitoring for model and data health automatically. Data science teams use the platform to monitor data pipelines and AI applications - surfacing data quality issues, data bias, data drift, and concept drift. Out-of-the-box anomaly detection and purpose-built visualizations let WhyLabs users prevent costly model failures and eliminate the need for manual troubleshooting.

WhyLabs is unique in its approach to monitoring data and ML models. It relies on the open-source data logging standard, whylogs, to generate data profiles, statistical summaries of datasets. These profiles get sent to the WhyLabs platform, where they can be analyzed and alerted on.  It works on any data, structured or unstructured, at any scale, on any platform.

Automated Labeling + Monitoring = Reliable Data Operations

Amongst the common use cases for WhyLabs’ customers is monitoring computer vision models. To monitor such a model, a “baseline” profile needs to be generated from the images on which the model is trained. Then, more profiles are generated on the images used for inference once the model is in production. These production profiles are compared against the baseline profile and against each other, allowing a data scientist to notice when data starts to drift and performance starts to degrade.

When a user experiences training-serving skew or data drift, they can be sure that model performance degradation is sure to follow. And if a model is not performing well, it is costing the business potential revenue that it would be able to capture if the model was functioning. To remedy this model performance degradation, a user can turn to SuperbAI to automatically label a fresh dataset and retrain their model based on this new data.


As you can see, WhyLabs and Superb AI fit together perfectly to enable data quality assurance for their users and enable reliable data operations.

If you’re interested in trying out the WhyLabs, check out the always-free Starter edition.

If you’re interested in trying out the Superb AI platform, request a free trial here.

Other posts

Re-imagine Data Monitoring with whylogs and Apache Spark

An overview of how the whylogs integration with Apache Spark achieves large scale data profiling, and how users can apply this integration into existing data and ML pipelines.

ML Monitoring in Under 5 Minutes

A quick guide to using whylogs and WhyLabs to monitor common issues with your ML models to surface data drift, concept drift, data quality, and performance issues.

AIShield and WhyLabs: Threat Detection and Monitoring for AI

The seamless integration of AIShield’s security insights on WhyLabs AI observability platform delivers comprehensive insights into ML workloads and brings security hardening to AI-powered enterprises.

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation with Fugue on Spark, Ray or Dask.

Monitoring Image Data with whylogs v1

When operating computer vision systems, data quality and data drift issues always pose the risk of model performance degradation. Whylabs provides a simple yet highly customizable solution for maintaining observability into data to detect issues and take action sooner.

WhyLabs Private Beta: Real-time, No-code, Cloud Storage Data Profiling

We’re excited to announce our Private Beta release for a no-code integration option for WhyLabs, allowing users to bypass the need to integrate whylogs into their data pipeline.

Data and ML Monitoring is Easier with whylogs v1.1

The release of whylogs v1.1 brings many features to the whylogs data logging API, making it even easier to monitor your data and ML models!

Model Monitoring for Financial Fraud Classification

Model monitoring is helping the financial services industry avoid huge losses caused by performance degradation in their fraud transaction models.

Robust & Responsible AI Newsletter - Issue #3

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo