WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

WhyLabs Team

Oct 3, 2022

Back to Blog

WhyLabs Private Beta: Real-time, No-code, Cloud Storage Data Profiling

WhyLabs
Whylogs
Product Updates

WhyLabs Team

Oct 3, 2022

Developers are busy enough as it is. For some, devoting development time to integrating DevOps tools and monitoring solutions can be enough of a deterrent that it ends up being perpetually kicked to the next sprint. We've talked to enough users at this point to understand that having no-code integration options is mandatory for certain use cases.

We've started working on a no-code integration option for WhyLabs that lets certain users bypass the need to integrate whylogs into their data pipeline. We already have a container based solution that enables you to black box the integration, but that involves hosting your own container. Some users find that more burdensome than just writing some code.

The first iteration of this solution is aimed at users who already have most of their data in cloud storage (starting with S3/AWS), don't want to invest any development time into profiling their data with whylogs, and don't mind permitting WhyLabs to ingest from their S3 bucket (without permanently storing any data). In the future, we'll support every cloud storage solution and give parts of this solution on-prem capabilities, so it can be deployed using tools like Terraform.

How it works

We're starting simple, essentially hosting our existing whylogs container ourselves and hooking it up to user S3 accounts via S3 events. The primary use case is real-time data monitoring; we won't be addressing historical data imports just yet.

The onboarding experience will consist of a few updates to your bucket:

Adding a policy to an S3 bucket that lets our AWS account download data.
Adding a special tag to your S3 bucket that we look out for so we know you own it.
Enabling S3 events directed at our SNS topic.

And then a few steps in our UI:

Specifying which column in your dataset represents time, or deciding to use the upload time instead.
Specifying the types of your columns if the inferred whylogs types aren't what you want.
Specifying the WhyLabs org and model id that you want to import the data into

At that point, you'll be able to upload csv and parquet files to your bucket and we'll automatically download and profile data. The image below is the rough sequence of events.

Your application will periodically upload data to S3.
AWS will trigger an event with a reference to that file in S3 to our SNS topic.
SNS invokes our Lambda function.
We validate the bucket tags and read any associated metadata in our back end (which org/model this belongs to, etc.).
Pipe it over to Kinesis.
Pipe it over into our whylogs container.
Download the data from S3, profile it with whylogs, and delete it from the container.
Upload the profile to WhyLabs for monitoring.

One of WhyLab's biggest strengths is that we don't need (or want) the raw data, we only need the profiles we generate with whylogs from the data. The only reason we're accessing data here is because we have a strong signal that some users would rather we "just do it" if it means they don't have to do any dev work. That said, we still have no interest in retaining the raw data and we drop it as soon as we profile it. We'll have better self hosting/on-prem options in the future for this system for people who can't share any data and who don't want to integrate whylogs into their architecture manually.

Wrapping up

We have several integration options depending on your needs as well. Integrating our open source whylogs library manually will always give the most flexibility, and our whylogs container is a happy middle ground between manual and no-code if you're willing to host it.

Early access and feedback

If you're interested in early access or have any feedback on the feature, reach out to us on Slack or email and mention this post, or fill out this google form. We'll be working on releasing this over the coming weeks and we'll follow up by adding support for Google Cloud and Azure eventually as well.

WhyLabs Team

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

WhyLabs Private Beta: Real-time, No-code, Cloud Storage Data Profiling

How it works

Wrapping up

Early access and feedback

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs