WhyLabs Private Beta: Real-time Data Monitoring on Prem
- Whylogs
- WhyLabs
- Product Updates
Dec 21, 2022
TLDR:
With whylogs' Profile Store, users can write, list, and get profiles based on a time window or its dataset_id. Users can use merged time-based profiles as a reference to make validations and constraints against incoming data. We have built a Profile Store Service to seamlessly integrate existing whylogs profiling pipelines with the Profile Store.
Profiling data with whylogs and WhyLabs is often proven to be the best combination in the market for data and ML monitoring. But sometimes, systems need to respond fast to data or concept drifts, even while profiling. Sending profiles to WhyLabs and getting alert signals might not be suitable for everyone, so we introduced the Profile Store, available in whylogs >= 1.1.7. With the Profile Store, users can get a reference profile on the fly and validate their data in real-time.
Now, we’ve built an extension service for the Profile Store, enabling production use cases of whylogs on customers' premises with a centralized and seamless integration. It is a Docker-based REST application that can be deployed to their cloud infrastructure to better manage Profile Stores. You can get, list, and write profiles to and from the Store and have it in sync with S3. In the future, we plan to extend its usage to keep profiles in sync with WhyLabs and other popular cloud storages.
How it works
To deploy the Profile Store Service onto your environment, you’ll only need to have a Docker container management system - such as ECS, Kubernetes, etc. - that will be able to keep the service up and running. The main benefits of using this compared to standalone whylogs' Profile Store are:
- Many different devices and apps can send profiled data over to a central place
- Highly concurrent
- It will periodically persist the profiles to the cloud
- No learning curve for the Profile Store's Python API
Users will have access to a REST endpoint that will let them write, list and get profiles by dataset_id or by a date range on the Profile Store. Periodic and asynchronously, this Profile Store will be persisted to S3 to keep all the historical profiles safe on the cloud. Let’s see some examples of how to interact with the Store Service client.
Example: Write a profile
import requests
import whylogs as why
from whylogs.core import DatasetProfileView
def write_profile(profile_view: DatasetProfileView) -> requests.Response:
resp = requests.post(
url = f"{STORE_SERVICE_ENDPOINT}/v0/profile/write",
params = {"dataset_id": f"{YOUR_DATASET_ID}"},
files = {"profile": profile_view.serialize()}
)
return resp
Example: Get profile by a moving date window
from datetime import datetime, timedelta
def get_ref_profile() -> DatasetProfileView:
start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
end_date_ts = datetime.utcnow().timestamp()
resp = requests.get(
url=f"{STORE_SERVICE_ENDPOINT}/v0/profile/get",
params={
"dataset_id": "my_profile",
"start_date": int(start_date_ts),
"end_date": int(end_date_ts)
}
)
merged_profile = DatasetProfileView.deserialize(resp.content)
return merged_profile
Using the Profile Store Service to validate incoming data
Now that we’ve seen how to manage the Profile Store Service, let’s put it to use and see how this can assist on-premises data validations and generate useful insights. We will build a whylogs Constraints Suite and run validations on example columns.
def run_constraints(
target_view: DatasetProfileView,
ref_view: DatasetProfileView
) -> Tuple[bool, Dict]:
builder = ConstraintsBuilder(dataset_profile_view=target_view)
q_95_a = ref_view.get_column("a").get_metric("distribution").q_95
q_95_b = ref_view.get_column("b").get_metric("distribution").q_95
q_95_c = ref_view.get_column("c").get_metric("distribution").q_95
builder.add_constraint(smaller_than_number(column_name="a", number=q_95_a))
builder.add_constraint(smaller_than_number(column_name="b", number=q_95_b))
builder.add_constraint(smaller_than_number(column_name="c", number=q_95_c))
constraints = builder.build()
return (constraints.validate(), constraints.report())
With these three functions, it’s possible to build a pipeline that will:
- Persist incoming data to the Profile Store
- Get the merged reference profile
- Run constraint checks using the merged distribution against incoming data
- Take different actions if the constraint validations pass or fail
import logging
logger = logging.getLogger(__name__)
def profile_data(df: pd.DataFrame) -> DatasetProfileView:
profile_view = why.log(df).view()
return profile_view
def run_pipeline(df: pd.DataFrame) -> None:
profile_view = profile_data(df)
reference_profile = get_ref_profile()
write_profile(profile_view)
valid, report = run_constraints(
target_view = profile_view,
ref_view = reference_profile
)
if valid is True:
logger.info(f"PASSED. REPORT: {report}")
elif valid is False:
logger.error(f"FAILED. REPORT: {report}")
This is useful because even though you have just started to use the store, it will try to fetch all the “possible profiles” on the last seven days - or any time range one might define. With the same codebase, you will always compare your profiles on the fly, and take immediate action to minimize the impacts of wrong predictions or incoming data.
Early access and feedback
Managing whylogs’ profiles on premises has become much easier, and hopefully, users will be able to run validations or constraint their workflows based on data and concept drift faster than ever before. If you're interested in testing the Profile Store Service capabilities or giving us feedback, reach out to our community Slack. We’ll be happy to help you get started using it.
Other posts
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI