blog bg left
Back to Blog

WhyLabs Private Beta: Real-time Data Monitoring on Prem

TLDR:

With whylogs' Profile Store, users can write, list, and get profiles based on a time window or its dataset_id. Users can use merged time-based profiles as a reference to make validations and constraints against incoming data. We have built a Profile Store Service to seamlessly integrate existing whylogs profiling pipelines with the Profile Store.


Profiling data with whylogs and WhyLabs is often proven to be the best combination in the market for data and ML monitoring. But sometimes, systems need to respond fast to data or concept drifts, even while profiling. Sending profiles to WhyLabs and getting alert signals might not be suitable for everyone, so we introduced the Profile Store, available in whylogs >= 1.1.7. With the Profile Store, users can get a reference profile on the fly and validate their data in real-time.

Now, we’ve built an extension service for the Profile Store, enabling production use cases of whylogs on customers' premises with a centralized and seamless integration. It is a Docker-based REST application that can be deployed to their cloud infrastructure to better manage Profile Stores. You can get, list, and write profiles to and from the Store and have it in sync with S3. In the future, we plan to extend its usage to keep profiles in sync with WhyLabs and other popular cloud storages.

How it works

To deploy the Profile Store Service onto your environment, you’ll only need to have a Docker container management system - such as ECS, Kubernetes, etc. - that will be able to keep the service up and running. The main benefits of using this compared to standalone whylogs' Profile Store are:

  • Many different devices and apps can send profiled data over to a central place
  • Highly concurrent
  • It will periodically persist the profiles to the cloud
  • No learning curve for the Profile Store's Python API

Users will have access to a REST endpoint that will let them write, list and get profiles by dataset_id or by a date range on the Profile Store. Periodic and asynchronously, this Profile Store will be persisted to S3 to keep all the historical profiles safe on the cloud. Let’s see some examples of how to interact with the Store Service client.

Example: Write a profile

import requests
import whylogs as why
from whylogs.core import DatasetProfileView


def write_profile(profile_view: DatasetProfileView) -> requests.Response:
    
    resp = requests.post(
      url = f"{STORE_SERVICE_ENDPOINT}/v0/profile/write",
      params = {"dataset_id": f"{YOUR_DATASET_ID}"},
      files = {"profile": profile_view.serialize()}
    )
    
    return resp

Example: Get profile by a moving date window

from datetime import datetime, timedelta


def get_ref_profile() -> DatasetProfileView: 
    start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
    end_date_ts = datetime.utcnow().timestamp()
    
    resp = requests.get(
      url=f"{STORE_SERVICE_ENDPOINT}/v0/profile/get", 
      params={
        "dataset_id": "my_profile",
        "start_date": int(start_date_ts),
        "end_date": int(end_date_ts)
      }
    )
    merged_profile = DatasetProfileView.deserialize(resp.content)
    return merged_profile

Using the Profile Store Service to validate incoming data

Now that we’ve seen how to manage the Profile Store Service, let’s put it to use and see how this can assist on-premises data validations and generate useful insights. We will build a whylogs Constraints Suite and run validations on example columns.

def run_constraints(
                target_view: DatasetProfileView, 
                ref_view: DatasetProfileView
) -> Tuple[bool, Dict]:
    builder = ConstraintsBuilder(dataset_profile_view=target_view)
    
    q_95_a = ref_view.get_column("a").get_metric("distribution").q_95
    q_95_b = ref_view.get_column("b").get_metric("distribution").q_95
    q_95_c = ref_view.get_column("c").get_metric("distribution").q_95
    
    builder.add_constraint(smaller_than_number(column_name="a", number=q_95_a))
    builder.add_constraint(smaller_than_number(column_name="b", number=q_95_b))
    builder.add_constraint(smaller_than_number(column_name="c", number=q_95_c))
    
    constraints = builder.build()
    return (constraints.validate(), constraints.report())

With these three functions, it’s possible to build a pipeline that will:

  1. Persist incoming data to the Profile Store
  2. Get the merged reference profile
  3. Run constraint checks using the merged distribution against incoming data
  4. Take different actions if the constraint validations pass or fail
import logging

logger = logging.getLogger(__name__)


def profile_data(df: pd.DataFrame) -> DatasetProfileView:
    profile_view = why.log(df).view()
    return profile_view


def run_pipeline(df: pd.DataFrame) -> None:
    profile_view = profile_data(df)
    reference_profile = get_ref_profile()
    write_profile(profile_view)
    valid, report = run_constraints(
                          target_view = profile_view,
                          ref_view = reference_profile                      
                        )
    if valid is True:
        logger.info(f"PASSED. REPORT: {report}")
    elif valid is False:
        logger.error(f"FAILED. REPORT: {report}")

This is useful because even though you have just started to use the store, it will try to fetch all the “possible profiles” on the last seven days - or any time range one might define. With the same codebase, you will always compare your profiles on the fly, and take immediate action to minimize the impacts of wrong predictions or incoming data.

Early access and feedback

Managing whylogs’ profiles on premises has become much easier, and hopefully, users will be able to run validations or constraint their workflows based on data and concept drift faster than ever before. If you're interested in testing the Profile Store Service capabilities or giving us feedback, reach out to our community Slack. We’ll be happy to help you get started using it.

Other posts

Achieving Ethical AI with Model Performance Tracing and ML Explainability

With Model Performance Tracing and ML Explainability, we’ve accelerated our customers’ journey toward achieving the three goals of ethical AI - fairness, accountability and transparency.

BigQuery Data Monitoring with WhyLabs

We’re excited to announce the release of a no-code solution for data monitoring in Google BigQuery, making it simple to monitor your data quality without writing a single line of code.

Robust & Responsible AI Newsletter - Issue #4

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

Understanding Kolmogorov-Smirnov (KS) Tests for Data Drift on Profiled Data

We experiment with statistical tests, Kolmogorov-Smirnov (KS) specifically, applied to full datasets and dataset profiles and compare the results.

Re-imagine Data Monitoring with whylogs and Apache Spark

An overview of how the whylogs integration with Apache Spark achieves large scale data profiling, and how users can apply this integration into existing data and ML pipelines.

ML Monitoring in Under 5 Minutes

A quick guide to using whylogs and WhyLabs to monitor common issues with your ML models to surface data drift, concept drift, data quality, and performance issues.

AIShield and WhyLabs: Threat Detection and Monitoring for AI

The seamless integration of AIShield’s security insights on WhyLabs AI observability platform delivers comprehensive insights into ML workloads and brings security hardening to AI-powered enterprises.

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation with Fugue on Spark, Ray or Dask.

Monitoring Image Data with whylogs v1

When operating computer vision systems, data quality and data drift issues always pose the risk of model performance degradation. Whylabs provides a simple yet highly customizable solution for maintaining observability into data to detect issues and take action sooner.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo
loading...