blog bg left
Back to Blog

WhyLabs Private Beta: Real-time Data Monitoring on Prem

TLDR:

With whylogs' Profile Store, users can write, list, and get profiles based on a time window or its dataset_id. Users can use merged time-based profiles as a reference to make validations and constraints against incoming data. We have built a Profile Store Service to seamlessly integrate existing whylogs profiling pipelines with the Profile Store.


Profiling data with whylogs and WhyLabs is often proven to be the best combination in the market for data and ML monitoring. But sometimes, systems need to respond fast to data or concept drifts, even while profiling. Sending profiles to WhyLabs and getting alert signals might not be suitable for everyone, so we introduced the Profile Store, available in whylogs >= 1.1.7. With the Profile Store, users can get a reference profile on the fly and validate their data in real-time.

Now, we’ve built an extension service for the Profile Store, enabling production use cases of whylogs on customers' premises with a centralized and seamless integration. It is a Docker-based REST application that can be deployed to their cloud infrastructure to better manage Profile Stores. You can get, list, and write profiles to and from the Store and have it in sync with S3. In the future, we plan to extend its usage to keep profiles in sync with WhyLabs and other popular cloud storages.

How it works

To deploy the Profile Store Service onto your environment, you’ll only need to have a Docker container management system - such as ECS, Kubernetes, etc. - that will be able to keep the service up and running. The main benefits of using this compared to standalone whylogs' Profile Store are:

  • Many different devices and apps can send profiled data over to a central place
  • Highly concurrent
  • It will periodically persist the profiles to the cloud
  • No learning curve for the Profile Store's Python API

Users will have access to a REST endpoint that will let them write, list and get profiles by dataset_id or by a date range on the Profile Store. Periodic and asynchronously, this Profile Store will be persisted to S3 to keep all the historical profiles safe on the cloud. Let’s see some examples of how to interact with the Store Service client.

Example: Write a profile

import requests
import whylogs as why
from whylogs.core import DatasetProfileView


def write_profile(profile_view: DatasetProfileView) -> requests.Response:
    
    resp = requests.post(
      url = f"{STORE_SERVICE_ENDPOINT}/v0/profile/write",
      params = {"dataset_id": f"{YOUR_DATASET_ID}"},
      files = {"profile": profile_view.serialize()}
    )
    
    return resp

Example: Get profile by a moving date window

from datetime import datetime, timedelta


def get_ref_profile() -> DatasetProfileView: 
    start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
    end_date_ts = datetime.utcnow().timestamp()
    
    resp = requests.get(
      url=f"{STORE_SERVICE_ENDPOINT}/v0/profile/get", 
      params={
        "dataset_id": "my_profile",
        "start_date": int(start_date_ts),
        "end_date": int(end_date_ts)
      }
    )
    merged_profile = DatasetProfileView.deserialize(resp.content)
    return merged_profile

Using the Profile Store Service to validate incoming data

Now that we’ve seen how to manage the Profile Store Service, let’s put it to use and see how this can assist on-premises data validations and generate useful insights. We will build a whylogs Constraints Suite and run validations on example columns.

def run_constraints(
                target_view: DatasetProfileView, 
                ref_view: DatasetProfileView
) -> Tuple[bool, Dict]:
    builder = ConstraintsBuilder(dataset_profile_view=target_view)
    
    q_95_a = ref_view.get_column("a").get_metric("distribution").q_95
    q_95_b = ref_view.get_column("b").get_metric("distribution").q_95
    q_95_c = ref_view.get_column("c").get_metric("distribution").q_95
    
    builder.add_constraint(smaller_than_number(column_name="a", number=q_95_a))
    builder.add_constraint(smaller_than_number(column_name="b", number=q_95_b))
    builder.add_constraint(smaller_than_number(column_name="c", number=q_95_c))
    
    constraints = builder.build()
    return (constraints.validate(), constraints.report())

With these three functions, it’s possible to build a pipeline that will:

  1. Persist incoming data to the Profile Store
  2. Get the merged reference profile
  3. Run constraint checks using the merged distribution against incoming data
  4. Take different actions if the constraint validations pass or fail
import logging

logger = logging.getLogger(__name__)


def profile_data(df: pd.DataFrame) -> DatasetProfileView:
    profile_view = why.log(df).view()
    return profile_view


def run_pipeline(df: pd.DataFrame) -> None:
    profile_view = profile_data(df)
    reference_profile = get_ref_profile()
    write_profile(profile_view)
    valid, report = run_constraints(
                          target_view = profile_view,
                          ref_view = reference_profile                      
                        )
    if valid is True:
        logger.info(f"PASSED. REPORT: {report}")
    elif valid is False:
        logger.error(f"FAILED. REPORT: {report}")

This is useful because even though you have just started to use the store, it will try to fetch all the “possible profiles” on the last seven days - or any time range one might define. With the same codebase, you will always compare your profiles on the fly, and take immediate action to minimize the impacts of wrong predictions or incoming data.

Early access and feedback

Managing whylogs’ profiles on premises has become much easier, and hopefully, users will be able to run validations or constraint their workflows based on data and concept drift faster than ever before. If you're interested in testing the Profile Store Service capabilities or giving us feedback, reach out to our community Slack. We’ll be happy to help you get started using it.

Other posts

Get Early Access to the First Purpose-Built Monitoring Solution for LLMs

We’re excited to announce our private beta release of LangKit, the first purpose-built large language model monitoring solution! Join the responsible LLM revolution by signing up for early access.

Mind Your Models: 5 Ways to Implement ML Monitoring in Production

We’ve outlined five easy ways to monitor your ML models in production to ensure they are robust and responsible by monitoring for concept drift, data drift, data quality, AI explainability and more.

Simplifying ML Deployment: A Conversation with BentoML's Founder & CEO Chaoyu Yang

A summary of the live interview with Chaoyu Yang, Founder & CEO at BentoML, on putting machine learning models in production and BentoML's role in simplifying deployment.

Data Drift vs. Concept Drift and Why Monitoring for Them is Important

Data drift and concept drift are two common challenges that can impact ML models on production. In this blog, we'll explore the differences between these two types of drift and why monitoring for them is crucial.

Robust & Responsible AI Newsletter - Issue #5

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

Detecting Financial Fraud in Real-Time: A Guide to ML Monitoring

Fraud is a significant challenge for financial institutions and businesses. As fraudsters constantly adapt their tactics, it’s essential to implement a robust ML monitoring system to ensure that models effectively detect fraud and minimize false positives.

How to Troubleshoot Embeddings Without Eye-balling t-SNE or UMAP Plots

WhyLabs' scalable approach to monitoring high dimensional embeddings data means you don’t have to eye-ball pretty UMAP plots to troubleshoot embeddings!

Achieving Ethical AI with Model Performance Tracing and ML Explainability

With Model Performance Tracing and ML Explainability, we’ve accelerated our customers’ journey toward achieving the three goals of ethical AI - fairness, accountability and transparency.

Detecting and Fixing Data Drift in Computer Vision

In this tutorial, Magdalena Konkiewicz from Toloka focuses on the practical part of data drift detection and fixing it on a computer vision example.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo
loading...