blog bg left
Back to Blog

Simplifying ML Deployment: A Conversation with BentoML's Founder & CEO Chaoyu Yang

In this live stream, we welcomed Chaoyu Yang, Founder & CEO at BentoML on the R2AI Podcast to discuss what it takes to put machine learning models in production and BentoML's role in simplifying deployment. Before founding BentoML, Chaoyu worked at Databricks and studied human-computer interaction at the University of Washington.

Machine Learning in Production - Chaoyu Yang, CEO of BentoML

Original Live Stream Date: Oct 20, 2022

Note: Below is a list of questions asked during the interview, and a summary of their answers. Please listen to the recording for more in-depth answers from Chaoyu Yang, CEO of BentoML!

What does machine learning in production mean?

Good ML services are crucial for scalability and real-world application integration. Deploying machine learning models in real-world applications requires building an ML service for application layer access. Data scientists must analyze data and train models before creating and deploying the service for scalable application use. At Databricks, the CEO observed data scientists needing help with production deployment, resorting to suboptimal workarounds.

What are some of the challenges with putting ML models in production?

Putting ML models into production is hard because business needs can change, and scalability needs to be ensured. Downtime in critical applications is costly, so complex architecture and engineering resources are necessary. However, some ML services can automate and scale operations. Tools like BentoML can simplify production deployment, addressing scaling and performance issues.

Once a model is already deployed in production, what are some of the challenges?

Post-deployment ML model challenges include service complexity, ML model monitoring, and model retraining. Maintaining and optimizing multiple models within a service can be difficult, and monitoring performance requires joining prediction data with downstream ground truth information. Retraining models to prevent staleness and conducting online experiments, such as A/B tests or shadow deployments, are crucial steps for successful ML services.

How is BentoML making ML model serving and deployments easier?

BentoML is an open-source tool that simplifies machine learning model deployment and management. Fundamental design principles include consistency, reproducibility, and adaptability. It offers a standardized format for reproducibility and supports various cloud platforms for batch and online inference. Its microservice architecture enables easy optimization and scalability. BentoML's user-friendly interface helps monitor and debug models in real-time. Integration with Kubernetes resources further enhances scalability.

Learn how to get started with BentoML and WhyLabs at the upcoming workshop!

Model Serving & Monitoring with BentoML + WhyLabs Workshop

The BentoML team is excited about the trend of accessible pre-trained models and infrastructure for large-scale machine learning projects. This makes applying ML models to interesting business problems easier, shifting focus from model training to MLOps infrastructure.

Interview wrap-up

In this interview, we learned about the challenges of putting machine learning models in production and how BentoML makes it easier to deploy models at scale. BentoML is a product that helps standardize the deployment of machine learning models by offering a microservice architecture that can easily integrate into existing Kubernetes infrastructure.

Bento ML also provides a user-friendly product that allows data scientists and machine learning engineers to keep track of their model deployments and model versions.

With tools like BentoML, it's becoming easier for teams to quickly apply machine learning models to solve real-world business problems and bring their ideas to life!

Learn more about BentoML and Chaoyu Yang:

Build Robust & Responsible AI:

The Robust & Responsible AI (R2AI) Community is a group of AI professionals who work to build responsible and robust artificial intelligence applications. Say hello in our Slack community!

The community is organized by WhyLabs, the market leader in ML monitoring and observability, helping teams reduce manual operations by over 80% and cut down time-to-resolution of ML incidents by 20x. Learn more about the WhyLabs AI Observatory and open source library, whylogs:

Other posts

Get Early Access to the First Purpose-Built Monitoring Solution for LLMs

We’re excited to announce our private beta release of LangKit, the first purpose-built large language model monitoring solution! Join the responsible LLM revolution by signing up for early access.

Mind Your Models: 5 Ways to Implement ML Monitoring in Production

We’ve outlined five easy ways to monitor your ML models in production to ensure they are robust and responsible by monitoring for concept drift, data drift, data quality, AI explainability and more.

Data Drift vs. Concept Drift and Why Monitoring for Them is Important

Data drift and concept drift are two common challenges that can impact ML models on production. In this blog, we'll explore the differences between these two types of drift and why monitoring for them is crucial.

Robust & Responsible AI Newsletter - Issue #5

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.

Detecting Financial Fraud in Real-Time: A Guide to ML Monitoring

Fraud is a significant challenge for financial institutions and businesses. As fraudsters constantly adapt their tactics, it’s essential to implement a robust ML monitoring system to ensure that models effectively detect fraud and minimize false positives.

How to Troubleshoot Embeddings Without Eye-balling t-SNE or UMAP Plots

WhyLabs' scalable approach to monitoring high dimensional embeddings data means you don’t have to eye-ball pretty UMAP plots to troubleshoot embeddings!

Achieving Ethical AI with Model Performance Tracing and ML Explainability

With Model Performance Tracing and ML Explainability, we’ve accelerated our customers’ journey toward achieving the three goals of ethical AI - fairness, accountability and transparency.

Detecting and Fixing Data Drift in Computer Vision

In this tutorial, Magdalena Konkiewicz from Toloka focuses on the practical part of data drift detection and fixing it on a computer vision example.

BigQuery Data Monitoring with WhyLabs

We’re excited to announce the release of a no-code solution for data monitoring in Google BigQuery, making it simple to monitor your data quality without writing a single line of code.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo