MLOps, ML Monitoring and Data Science Glossary
To help you better understand MLOps, Machine Learning (ML) Monitoring and Data Science, we have created a glossary of common terms.
AI explainability
AI explainability refers to the ability of artificial intelligence (AI) models to provide understandable and interpretable explanations for their predictions or decisions. It involves techniques and methods that aim to make AI models transparent and accountable, allowing humans to understand how and why a particular prediction or decision was made. AI explainability is crucial for building trust in AI systems, ensuring fairness, accountability, and transparency, and enabling humans to understand, validate, and interpret the outcomes of AI models in a meaningful way.
Bias
Bias refers to a systematic error in a model's predictions or decisions, caused by the model's inability to capture the true underlying relationship between the input variables and the output variable. This can lead to inaccuracies or discrimination against certain groups or individuals. Bias can be caused by various factors such as a biased training dataset, inadequate feature selection, or an inappropriate choice of algorithm. To mitigate bias, it is important to carefully select and preprocess data, use appropriate evaluation metrics, and regularly monitor the model's performance on diverse data.
Concept drift
Two common challenges that can impact ML models in production are data drift and concept drift. Concept drift refers to changes in the relationships between the input features and the target variable that a model is trying to predict. This means that even if the input data remains the same, the underlying relationships between the variables may change over time, which can impact the model's accuracy.
Additional Resources:
Data drift
Two common challenges that can impact ML models in production are data drift and concept drift. Data drift refers to changes in the input data used for modeling. Changes in the input data used to train a model or make predictions can be caused by changes in the data sources, data collection process, or in the data distribution itself over time.
Additional Resources:
Data logging
Data logging is the capture, storage, and presentation of one or more datasets for analysis. This is then used to identify trends, correlations, and the analysis of the data for future predictions.
Additional Resources:
Data quality
Data quality refers to the consistency, accuracy, and relevancy of a data set. As data pipelines handle larger volumes of data from a variety of sources and increase in complexity, data quality becomes one of the most important factors to overall model health.
Additional Resources:
Data-centric AI
Data-centric AI refers to an approach in artificial intelligence (AI) where the focus is on leveraging data as a primary driver for model development and decision-making. In data-centric AI, the quality, quantity, and diversity of data are prioritized, and models are trained to learn from data in an autonomous and adaptive manner.
Deep Learning
Deep Learning is a subfield of machine learning that uses artificial neural networks to train models that can learn from and make predictions or decisions based on large amounts of data. These neural networks, organized in multiple layers, can extract complex features from raw data, allowing for highly accurate and sophisticated pattern recognition.
Distribution shifts
Distribution shift refers to a change in the statistical distribution of the input data used to train a model compared to the distribution of the input data used in the real world application. This can occur when the data used to train the model is collected from a different source or time period than the data it will be applied to. As a result, the model may fail to generalize well to new data, leading to decreased accuracy and performance.
Additional Resources:
Embeddings
Embeddings are a way to represent data in a lower-dimensional space, while preserving the relationships between the different data points. Embeddings are widely used in natural language processing (NLP), computer vision, and other areas of machine learning.
Additional Resources:
Feature Store
A feature store is a centralized repository for storing, managing, and sharing machine learning features across an organization. Features are the inputs to a machine learning model that the model uses to make predictions. A feature store makes it easy for data scientists and machine learning engineers to discover, share, and reuse features, allowing them to build models more efficiently and effectively.
Hellinger distance
Hellinger distance, also known as Hellinger divergence or Hellinger kernel, is a measure of similarity or dissimilarity between probability distributions. It is commonly used in machine learning and statistics to compare and quantify the differences between two probability distributions.
Kellinger Distance
Kullback-Leibler (KL) Divergence, also known as Kullback-Leibler Distance, is a measure of the difference between two probability distributions. It measures how much information is lost when approximating one distribution with another. KL Divergence is asymmetric, meaning that the distance from distribution A to distribution B is not necessarily the same as the distance from distribution B to distribution A.
Kolmogorov-Smirnov (KS) Tests
The Kolmogorov-Smirnov (KS) test is a statistical test used to compare the similarity or difference between two probability distributions. It measures the maximum difference between the cumulative distribution functions (CDFs) of the two distributions, quantifying their level of similarity or dissimilarity.
Additional Resources:
LLM Hallucinations
In the world of Large Language Models (LLMs), the term hallucination refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. LLMs can sometimes make up facts, invent unprompted fictions, and produce confident responses that belie an underlying falsehood.
Additional Resources:
LLMOps
Large language models operations (LLMOps) is a specific type of machine learning operations (MLOps) that delivers necessary infrastructure and tools to make it easy to build and deploy LLMs. LLMOps addresses the lifecycle management of LLMs, including training, evaluating, fine-tuning, deployment, and monitoring.
Large Language Models (LLMs)
Large Language Models (LLMs) are a type of artificial intelligence model that use deep learning techniques to analyze and understand natural language text. LLMs are trained on massive amounts of text data, allowing them to generate human-like responses to text-based prompts. These models have been used in a variety of applications, including language translation, chatbots, and text-based content generation.
ML observability
ML observability is the ability for an organization to monitor and understand a model’s performance across all stages of the model development cycle.
ML observability involves collecting, analyzing, and interpreting various data points and metrics from the ML system to gain visibility and understanding of its performance, reliability, and robustness. ML observability aims to provide practitioners with actionable insights and contextual understanding of the ML model's behavior and performance, facilitating effective monitoring, debugging, and optimization of ML systems.
Additional Resources:
MLOps
MLOps (Machine Learning Operations) is a paradigm that includes best practices, sets of concepts, and development culture to facilitate the machine learning process. MLOps aims to provide faster experimentation and development of models, faster deployment of models into production, and quality assurance and end-to-end lineage tracking
Machine Learning (ML)
Machine Learning (ML) is a field of computer science that involves training models or algorithms to make predictions based on data inputs. These algorithms “learn” from training data instead of being explicitly programmed to complete a certain task.
Model drift
Model drift is a term used to describe how the performance of a machine learning model in production changes, or slowly gets worse, over time.
When a model is trained on a certain dataset, it learns the patterns and relationships in that data. However, over time, the underlying data distribution may change, which can result in the model becoming less accurate or less reliable in its predictions. Model drift can be caused by various factors such as changes in the data collection process, shifts in user behavior, or updates in the environment where the model is deployed.
Model performance
Model performance is an assessment of how well an ML model is doing, not only with training data but also in real-time once the model has been deployed to production. It describes the accuracy of the model's predictions, and how effectively it can perform its tasks with the data it has been trained on.
High-performing models means accurate and trustworthy predictions for your respective use cases.
Model performance degradation
Model performance degradation refers to the decline in the performance of a machine learning model over time. It can occur due to various factors, such as changes in the data distribution, feature drift, or model aging.
Degradation can be a result of data quality issues, and real-world data differing from the baseline data the model was trained on, as well as a myriad of other factors like statistical anomalies and an accumulation of unseen errors within the system.
Natural Language Processing (NLP)
Natural Language Processing (NLP) is a field of AI that centers around the ability to understand, interpret, and identify human languages. Some of the most popular uses of NLP include translation, sentiment analysis, text summarization, and voice recognition.
Open source software
OSS or Open Source Software is any software where the entirety of the program, including the source code, is available online, for free, and can be modified by any independent party.
Population Stability Index (PSI)
Population stability index (PSI) is a metric to measure how much a variable has shifted in distribution between two samples or over time. It is commonly used for monitoring changes in the characteristics of a population and for diagnosing possible problems in model performance. Using this metric you can check how the current scoring is compared to the predicted probability from the training data set.
Profiling
In contrast, profiling collects statistical measurements of the data. In the case of whylogs, the metrics produced come with mathematically derived uncertainty bounds. These profiles are scalable, lightweight, flexible, and configurable. Rare events and outlier-dependent metrics can be accurately captured. The results are statistical and of a standard, portable data format which are directly interpretable. Learn more about sampling vs profiling here.
Prompt injection
A prompt injection attack is a type of cyberattack where a hacker enters a text prompt into a large language model (LLM) or chatbot, which is designed to enable the user to perform unauthorized actions.
Regression models
Regression models are algorithms that are used to predict a continuous numerical output variable based on one or more input variables. The goal of a regression model is to find the relationship between the input variables and the output variable, and use that relationship to make predictions about the output variable for new input data. The most common types of regression models are linear regression, polynomial regression, and logistic regression.
Responsible AI
Responsible AI is the idea that AI should be developed, designed, and implemented with good intentions. Its core principles are that AI should be developed in a fair, transparent, accountable, and most importantly, in a non-discriminatory fashion.
Sampling
Tracing is the process of tracking the flow of data through a machine learning system, including the input data, the models used, and the output results. Tracing can be used to identify bottlenecks or errors in the system, and to debug issues that may arise during development or deployment. It can also help with performance analysis and optimization, by identifying which parts of the system are taking the most time or resources. Tracing is typically done through the use of specialized software tools that can monitor and log the flow of data through the system in real-time.
Additional Resources:
Shapley values
SHAP values (Shapley Additive Explanations) are based on game theory and assign an importance value to each feature in a model. It’s used to increase the transparency and interpretability of machine learning models. Features with positive SHAP values positively impact the prediction, and those with negative values have a negative impact.
Tracing
Tracing is the process of tracking the flow of data through a machine learning system, including the input data, the models used, and the output results. Tracing can be used to identify bottlenecks or errors in the system, and to debug issues that may arise during development or deployment. It can also help with performance analysis and optimization, by identifying which parts of the system are taking the most time or resources. Tracing is typically done through the use of specialized software tools that can monitor and log the flow of data through the system in real-time.
Additional Resources:
Vector database
Vector databases are utilized to conduct similarity searches through techniques such as the Approximate Nearest Neighbor (ANN) algorithm. The vectors are usually generated by applying some kind of transformation or embedding function to the raw data, such as text, images, audio, video, and others.