WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Andre Elizondo,

Alessya Visnjic

Jun 23, 2023

Back to Blog

Production-Ready Models with Databricks and WhyLabs

WhyLabs
Integrations
AI Observability
Whylogs
News

Andre Elizondo,

Alessya Visnjic

Jun 23, 2023

Every AI practitioner turns to Databricks when they need to build massively scalable AI and data pipelines. But once in production, these AI pipelines are prone to silent failures because of missing values, changing schema, and broken feature transformations. Troubleshooting these failures in terabyte-sized pipelines is like looking for a needle in a needlestack, it takes weeks and often causes significant financial losses because of degraded customer experience. That’s where WhyLabs enables AI and data monitoring that works seamlessly in the Databricks environment. WhyLabs AI Observatory is built to monitor any scale of data, in a distributed environment, without the need to sample or move data around.

Unlocking capabilities that are almost too good to be true

WhyLabs has partnered with Databricks to enable a unique integration that makes it possible to compute all key telemetry data necessary for AI monitoring directly in Apache Spark. The telemetry data is collected in parallel with the AI pipeline using the WhyLabs-MLFlow integration. With this integration, every data science and machine learning team can monitor data quality at the speed of 1M rows per second, while enjoying a fully distributable computation.

With WhyLabs, teams can answer questions about data of any size:

How do I know if the data my model depends on is always high quality? A spike in null values or a sudden change in schema can cause your model to produce unexpected results, monitoring your data continually is important for a production-ready model.
How can I identify data drift before it causes my model to degrade? Data drift can be slow or abrupt and cause your model to produce less accurate results. Without a method of continually comparing datasets against themselves over time, this can be a very manual process to identify drift in your data.
How can I gauge the performance of my models over time and know if it’s gotten worse? Understanding whether a model is healthy involves more than what application or infrastructure tools give visibility into. You need visibility into the predictions as well as ground truth to understand how your model is performing over time.
How can I identify bias in my predictions and correlate it to specific segments of my dataset? Bias in your predictions can be challenging to identify unless you’re monitoring your predictions at scale looking for segments that are performing poorly.

One solution for observability across the entire Databricks ecosystem

WhyLabs AI Observatory enables out-of-the-box monitoring for your data pipelines and models running in Databricks across all data types: structured, semi-structured, unstructured, and streaming. The Lakehouse architecture is incredibly flexible, covering use cases across business intelligence, data streaming, machine learning, and generative AI. With WhyLabs, there is an integration that supports every one of these use cases. We will briefly introduce each of these use cases, but if you are looking for advice on the best place to plug in WhyLabs for your unique setup, ping us on the community Slack channel.

Big data and streaming data pipelines with Spark and Delta Live Tables

Data is the lifeblood of any AI and BI application, so ensuring that this data is high quality and observable is critical for the health of these applications. Across our customers, the most common performance degradation in AI models is caused by data quality bugs: missing values, changes in distribution, or introduction of new categories. WhyLabs provides an easy to integrate and cost-effective solution for monitoring all key data quality metrics, alerting team members about issues, and helping with root cause analysis. WhyLabs Integrates with Delta Live Tables and Spark, making it easy to set up observability for any data pipeline.

from pyspark.sql import SparkSession
from pyspark import SparkFiles
from whylogs.api.pyspark.experimental import collect_dataset_profile_view
import whylogs as why
from whylogs.api.writer.whylabs import WhyLabsWriter

spark = SparkSession.builder.appName('whylogs-testing').getOrCreate()
arrow_config_key = "spark.sql.execution.arrow.pyspark.enabled"
spark.conf.set(arrow_config_key, "true")

data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
spark.sparkContext.addFile(data_url)

spark_dataframe = spark.read.option("delimiter", ";") \
  .option("inferSchema", "true") \
  .csv(SparkFiles.get("winequality-red.csv"), header=True)

dataset_profile_view = collect_dataset_profile_view(input_df=spark_dataframe)

writer = WhyLabsWriter()
writer.write(file=profile.view())

ML model experimentation and serving with MLflow

ML models power the most important applications across enterprises, everything from marketing to core product experiences. The MLflow toolchain makes it simple to track experiments during model development as well as to serve models for inference in production. Once the model is in production, monitoring its performance and the health of model inputs is crucial to ensure ROI. Without monitoring, ML models fail silently because of data drift, biases, and data quality issues. WhyLabs integrates with MLflow serving to build a training data baseline and to assess model training data for data quality issues. Once in production, WhyLabs integrates with MLflow serving to continuously monitor model performance and ensure that the model inputs are not drifting to cause training-serving skew.

AI applications built on Dolly

Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform. Large Language Models (LLMs) like Dolly are transforming the landscape of AI applications from genuinely helpful chatbots to nearly autonomous code generation tools. But this incredible technology doesn’t come without deployment challenges, as these models are prone to hallucinations, biases, as well as privacy and security loopholes. WhyLabs makes it easy to monitor and safeguard LLMs hosted on Databricks utilizing our industry standard for LLM monitoring, LangKit. LangKit detects and prevents malicious prompts, toxicity, hallucinations, and jailbreak attempts. Here is an example of our seamless integration of LangKit with LangChain for building AI applications.

from langkit import llm_metrics
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
from training.generate import InstructionTextGenerationPipeline, load_model_tokenizer_for_generate


model, tokenizer = load_model_tokenizer_for_generate(“databricks/dolly-v2-3b”)

prompt = PromptTemplate(input_variables=["instruction"], template="{instruction}")

whylabs = WhyLabsCallbackHandler.from_params(org_id=‘<your-org>’, api_key=‘<your-key>’, dataset_id=‘<your-dataset>’)

hf_pipeline = HuggingFacePipeline(
    pipeline=InstructionTextGenerationPipeline(
        model=model, tokenizer=tokenizer, return_full_text=True, task="text-generation"),
        callbacks=[whylabs]
    )

llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)

response = llm_chain.run(instruction="Explain to me the difference between nuclear fission and fusion.")

Privacy-preserving and cost-effective integration

WhyLabs integrates in a unique privacy-preserving way. The integration does not require users to move data outside of the existing environment - all data processing happens within the Databricks Lakehouse and does not involve data duplication or sampling. This integration approach reduces security risks significantly and ensures that data is handled with the highest level of privacy and confidentiality. The data processing is done using highly optimized telemetry agents (whylogs), which enables telemetry collection in a fully distributed manner. Using telemetry agents is very cost-effective, as the computation of telemetry adds minimal overhead to existing pipelines (see benchmarks). The WhyLabs AI telemetry agents have been battle tested at massive data scale organizations like Lyft, StitchFix, Square, and Yahoo Japan.

Take the guesswork out of AI/ML and data health today

Enabling WhyLabs on any Databricks pipeline is quick and simple. Get started with this easy example and see the power of WhyLabs observability in a matter of minutes. If you’d like to talk about your specific use case, reach out over Slack or schedule time with our solution architects - we’d be happy to help!

Andre Elizondo,

Alessya Visnjic

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Production-Ready Models with Databricks and WhyLabs

Unlocking capabilities that are almost too good to be true

One solution for observability across the entire Databricks ecosystem

Big data and streaming data pipelines with Spark and Delta Live Tables

ML model experimentation and serving with MLflow

AI applications built on Dolly

Privacy-preserving and cost-effective integration

Take the guesswork out of AI/ML and data health today

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs