Show your love for LangKit with a star!
Open source tool for monitoring large language models (LLMs)
Out-of-the-box telemetry from LLM prompts and responses to track quality, relevance, sentiment, and security metrics.
Get started in seconds:
pip install langkit [all]
Then, run this code:
fromlangkitimportllm_metrics, extract
results = extract({"prompt":"hello", "response":"world"}, schema=llm_metrics.init())
Show your love for LangKit with a star!
Available in Python
LangKit lets you:
Evaluate that the LLM behavior is compliant with policy
Compare and A/B test across different LLM and prompt versions
Validate and safeguard individual prompts and responses
Monitor user interactions inside LLM-powered applications
LangKit is an open-source text metrics toolkit that helps enterprises monitor and safeguard their large language models (LLMs).
With LangKit, you can extract critical telemetry from prompts and responses to detect and prevent risks and issues in LLMs, such as toxic language, data leakage, hallucinations, and jailbreaks. You can analyze these metrics on your own, or send them to the WhyLabs platform for monitoring and observability!
“With LangKit, WhyLabs provides an extensible and scalable approach for solving challenges that many AI practitioners will face when deploying LLMs in production.”
Andrew Ng
Managing General Partner of AI Fund
Whether you're integrating with a public API or running a proprietary model, LangKit empowers you to implement guardrails, evaluations, and observability for your LLMs.
Works with any LLM:
LangKit integrates with several popular platforms and frameworks, including OpenAI GPT-4, Hugging Face Transformers, AWS Boto3, and more.Easy to use:
LangKit’s out-of-the-box telemetry from LLM prompts and responses helps users track critical metrics about quality, relevance, sentiment, and security with just a few lines of Python code. Users can also customize and extend LangKit with their own models and metrics to suit their specific use cases.Built for enterprises
LangKit is specifically designed for production scenarios and automated systems that require a wide range of metrics and alerts to track LLM behavior and performance. Using a metrics-based approach, LangKit is suitable for scalable and operational use cases.LangKit allows you to extract actionable insights about prompts and responses to detect and prevent risks and issues such as toxic language, data leakage, hallucinations, and jailbreaks.
Out-of-the-box metrics include:
Text Quality
Evaluate the quality and appropriateness of generated responses with metrics like readability, complexity, and grade level to ensure LLM outputs are clear, concise, and suitable for the intended audience.
- Assess sentence structure, vocabulary choice, and domain-specific requirements to ensure the LLM produces responses that align with the intended reading level and professional context.
- Incorporate metrics such as syllable count, word count, and character count to closely monitor the length and composition of the generated text to ensure that the responses remain concise, focused, and easily digestible for users.
Text Relevance
Evaluate the quality of generated responses and establish guardrails to minimize the risk of generating inappropriate or harmful content.
- Compute similarity scores between embeddings generated from prompts and responses to evaluate the relevance between them to identify potential issues such as irrelevant or off-topic responses.
- Calculate the similarity of prompts and responses against certain topics or known examples, such as jailbreaks or controversial subjects to detect potential dangerous or unwanted responses.
Security and Privacy
Ensure the protection of user data and prevent malicious activities by strengthening the security and privacy measures within LLM systems.
- Measure text similarity between prompts and responses against known examples of jailbreak attempts, prompt injections, and LLM refusals of service to identify potential security vulnerabilities and unauthorized access attempts.
- Check prompts and responses against regex patterns to detect and flag sensitive information like credit card numbers, telephone numbers, or other types of personally identifiable information (PII).
Sentiment and Toxicity
Detect potentially harmful or inappropriate content within LLM outputs with sentiment and toxicity classifiers.
- Analyze sentiment scores to gauge the overall tone and emotional impact of responses and ensure that the LLM is consistently generating appropriate and contextually relevant responses.
- Monitor toxicity scores to identify offensive, disrespectful, or harmful language in LLM outputs and take necessary actions to mitigate any negative impact.
- Identify potential biases or controversial opinions to address concerns related to fairness, inclusivity, and ethical considerations.
How do I use LangKit?
Running with LangKit is as easy as a few lines of code which will apply a number of metrics by default:
from langkit import llm_metrics, extract
import pandas as pd
df = pd.DataFrame({"prompt":["Hello!","I'm fine! Thanks for asking"], "response":["Hi! How are you?","Glad to hear!"]})
enhanced_df = extract(df,schema=llm_metrics.init())
The generated profiles can be analyzed on your own accord or sent to the WhyLabs platform for monitoring and observability. Langkit integrates seamlessly with whylogs, the open-source data profiling package. With your langkit-extracted metrics at hand, you can profile your data with the open-source data profiling package whylogs and upload it to WhyLabs:
import whylogs as why
why.init()
profile = why.log(enhanced_df)
With WhyLabs, you can establish thresholds and baselines for malicious prompts, sensitive data leakage, toxicity, problematic topics, hallucinations, and jailbreak attempts. These alerts and guardrails enable any application developer to prevent inappropriate prompts, unwanted LLM responses, and violations of LLM usage policies.
Where do I find other LangKit users and get help?
Join the WhyLabs Community on Slack!
The WhyLabs Community is a forum for you to connect with other practitioners, share ideas, and learn about exciting new techniques.