WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Sage Elliott

Apr 4, 2023

Back to Blog

Simplifying ML Deployment: A Conversation with BentoML's Founder & CEO Chaoyu Yang

WhyLabs
ML Monitoring

Sage Elliott

Apr 4, 2023

In this live stream, we welcomed Chaoyu Yang, Founder & CEO at BentoML on the R2AI Podcast to discuss what it takes to put machine learning models in production and BentoML's role in simplifying deployment. Before founding BentoML, Chaoyu worked at Databricks and studied human-computer interaction at the University of Washington.

Machine Learning in Production - Chaoyu Yang, CEO of BentoML

Original Live Stream Date: Oct 20, 2022

Note: Below is a list of questions asked during the interview, and a summary of their answers. Please listen to the recording for more in-depth answers from Chaoyu Yang, CEO of BentoML!

What does machine learning in production mean?

Good ML services are crucial for scalability and real-world application integration. Deploying machine learning models in real-world applications requires building an ML service for application layer access. Data scientists must analyze data and train models before creating and deploying the service for scalable application use. At Databricks, the CEO observed data scientists needing help with production deployment, resorting to suboptimal workarounds.

What are some of the challenges with putting ML models in production?

Putting ML models into production is hard because business needs can change, and scalability needs to be ensured. Downtime in critical applications is costly, so complex architecture and engineering resources are necessary. However, some ML services can automate and scale operations. Tools like BentoML can simplify production deployment, addressing scaling and performance issues.

Once a model is already deployed in production, what are some of the challenges?

Post-deployment ML model challenges include service complexity, ML model monitoring, and model retraining. Maintaining and optimizing multiple models within a service can be difficult, and monitoring performance requires joining prediction data with downstream ground truth information. Retraining models to prevent staleness and conducting online experiments, such as A/B tests or shadow deployments, are crucial steps for successful ML services.

How is BentoML making ML model serving and deployments easier?

BentoML is an open-source tool that simplifies machine learning model deployment and management. Fundamental design principles include consistency, reproducibility, and adaptability. It offers a standardized format for reproducibility and supports various cloud platforms for batch and online inference. Its microservice architecture enables easy optimization and scalability. BentoML's user-friendly interface helps monitor and debug models in real-time. Integration with Kubernetes resources further enhances scalability.

Learn how to get started with BentoML and WhyLabs at the upcoming workshop!

Model Serving & Monitoring with BentoML + WhyLabs Workshop

What AI trends are currently exciting to you?

The BentoML team is excited about the trend of accessible pre-trained models and infrastructure for large-scale machine learning projects. This makes applying ML models to interesting business problems easier, shifting focus from model training to MLOps infrastructure.

Interview wrap-up

In this interview, we learned about the challenges of putting machine learning models in production and how BentoML makes it easier to deploy models at scale. BentoML is a product that helps standardize the deployment of machine learning models by offering a microservice architecture that can easily integrate into existing Kubernetes infrastructure.

Bento ML also provides a user-friendly product that allows data scientists and machine learning engineers to keep track of their model deployments and model versions.

With tools like BentoML, it's becoming easier for teams to quickly apply machine learning models to solve real-world business problems and bring their ideas to life!

Learn more about BentoML and Chaoyu Yang:

BentoML’s Github (give them a star!) ⭐
Chaoyu Yang’s LinkedIn & Twitter
See the BentoML & WhyLabs integration
Join the workshop: Model Serving & Monitoring with BentoML + WhyLabs

Build Robust & Responsible AI:

The Robust & Responsible AI (R2AI) Community is a group of AI professionals who work to build responsible and robust artificial intelligence applications. Say hello in our Slack community!

The community is organized by WhyLabs, the market leader in ML monitoring and observability, helping teams reduce manual operations by over 80% and cut down time-to-resolution of ML incidents by 20x. Learn more about the WhyLabs AI Observatory and open source library, whylogs: https://whylabs.ai/

Sage Elliott

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Simplifying ML Deployment: A Conversation with BentoML's Founder & CEO Chaoyu Yang

Machine Learning in Production - Chaoyu Yang, CEO of BentoML

What does machine learning in production mean?

What are some of the challenges with putting ML models in production?

Once a model is already deployed in production, what are some of the challenges?

How is BentoML making ML model serving and deployments easier?

What AI trends are currently exciting to you?

Interview wrap-up

Learn more about BentoML and Chaoyu Yang:

Build Robust & Responsible AI:

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs