Opik: A Comprehensive Guide to the Open-Source LLM Evaluation Framework

In the current field of artificial intelligence, large language models (LLMs) are being applied more and more widely. From RAG chatbots to code assistants, and complex agent pipelines, LLMs play a crucial role. However, evaluating, testing, and monitoring these LLM applications has become a significant challenge for developers. Opik, as an open-source platform, offers an effective solution to this problem. This article will provide a detailed introduction to Opik, covering its functions, installation methods, quick start steps, and how to contribute to it.

What is Opik?

Opik is an open-source platform built by Comet, specifically designed for evaluating, testing, and monitoring LLM applications. It helps developers build efficient, fast, and cost-effective LLM systems with features like tracking, evaluation, and dashboards. Whether during the development process or in a production environment, Opik can be of great assistance.

Use Cases of Opik

Opik can help developers handle LLM applications in multiple ways, as detailed below:

During Development
- Tracking: Opik can track all LLM calls and traces during development and production. Developers can refer to the Quickstart and Integrations documentation for detailed instructions. This helps developers understand the operation of LLMs, identify issues promptly, and optimize accordingly.
- Annotation: Developers can annotate LLM calls by recording feedback scores through the Python SDK or the UI. This makes it convenient for developers to evaluate and analyze LLM outputs.
- Sandbox: In the Prompt Sandbox, developers can try out different prompts and models to find the most suitable combination for their applications.
For Evaluation
- Datasets and Experiments: Opik allows developers to store test cases and run experiments. They can learn how to manage datasets and run experiments through the Datasets and Evaluate Your LLM App documentation.
- LLM Evaluation Metrics: Opik provides various LLM evaluation metrics to handle complex issues such as hallucination detection, content moderation, and RAG evaluation. For example, Hallucination Detection, Moderation, Answer Relevance, and Context Precision.
- CI/CD Integration: Developers can integrate the evaluation process into the CI/CD pipeline using Opik’s PyTest Integration for automated evaluation.

Installing Opik

Opik can be obtained in two ways: as a fully open-source local installation or through a hosted solution provided by Comet.com. The following are the installation methods for each.

Using the Comet.com Hosted Solution

The easiest way to get started is to create a free Comet account on comet.com. After registration, you can start using the Opik service provided by Comet.

Local Installation of Opik

If developers prefer to self-host Opik, they can clone the repository and use Docker Compose to launch the platform. The installation steps vary for different operating systems:

For Linux or Mac

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository directory
cd opik

# Launch the Opik platform
./opik.sh

For Windows

# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git

# Navigate to the repository directory
cd opik

# Launch the Opik platform
powershell -ExecutionPolicy ByPass -c ".\opik.ps1"

If you encounter any issues during the installation process, you can use the --help or --info options for troubleshooting. After launching, you can access the Opik platform in your browser at localhost:5173.

More Installation Options

Besides the above two methods, Opik also supports other installation options, as shown below:

Installation Method	Documentation Link
Local Deployment
Kubernetes

Developers can choose the appropriate installation method based on their requirements.

Getting Started with Opik

After installing Opik, you can start using it quickly. The following steps will guide you through installing the Python SDK, configuring Opik, and recording traces.

Installing the Python SDK

First, install the Python SDK using the following command:

pip install opik

Configuring Opik

After installing the SDK, run opik configure for configuration:

opik configure

Developers can also configure a local installation by calling opik.configure(use_local=True) in their Python code.

Recording Traces

After configuration, you can start recording traces using the Python SDK. Opik supports multiple integration methods, as shown below:

Integration Method	Description	Documentation Link	Online Trial
OpenAI	Records all OpenAI LLM call traces	Documentation	Online Trial
LiteLLM	Calls any LLM model using the OpenAI standard format	Documentation	Online Trial
LangChain	Records all LangChain LLM call traces	Documentation	Online Trial
…	…	…	…

If the framework you are using is not on the list, you can submit a Problem Report or add integration support through a PR.

If you don’t use the above frameworks, you can also use the track decorator to record traces. Here is an example code:

import opik

opik.configure(use_local=True) # For local operation

@opik.track
def my_llm_function(user_question: str) -> str:
    # Write LLM code here
    return "Hello"

Note that the track decorator can be used with any integration and for tracking nested function calls.

LLM Evaluation Metrics

Opik’s Python SDK includes various LLM evaluation metrics to help developers assess LLM applications. For more details, refer to the Evaluation Metrics Documentation.

For example, here is the example code for using the hallucination detection metric:

from opik.evaluation.metrics import Hallucination

metric = Hallucination()
score = metric.score(
    input="What is the capital of France?",
    output="Paris",
    context=["France is a country located in Europe."]
)
print(score)

Opik also includes many pre – built heuristic metrics and supports the creation of custom metrics. For more information, see the Evaluation Metrics Documentation.

Evaluating LLM Applications

Opik allows developers to evaluate their LLM applications during development through Datasets and Experiments.

Developers can also run evaluations in the CI/CD pipeline through Opik’s PyTest Integration for an automated evaluation process.

Follow Opik on GitHub

If you find Opik useful, consider giving it a Star on its GitHub repository. Your support helps Opik grow its community and continuously improve the product. You can also view Opik’s GitHub Star history and other information through the relevant links on GitHub.

Contributing to Opik

Opik is an open – source project, and developers are welcome to contribute in the following common ways:

Submit Bug Reports and Feature Requests: If you find a bug or have a new feature request while using Opik, you can submit Bug Reports and Feature Requests.
Improve Documentation: Review the documentation and submit Pull Requests to improve it, making it more complete and user – friendly for other developers.
Write Articles: Developers can write and publish articles related to Opik and share them with the Opik team via Contact Us.
Support Popular Feature Requests: Show your support for the project by backing Popular Feature Requests.

For detailed contribution methods, refer to the Contribution Guide.

Conclusion

As an open – source LLM evaluation framework, Opik provides developers with rich features and a convenient user experience. By tracking, evaluating, and monitoring LLM applications, Opik helps developers build efficient, fast, and cost – effective LLM systems. Whether it’s debugging and optimizing during development or monitoring and evaluating in a production environment, Opik can play a vital role. Additionally, Opik supports multiple installation and integration methods, allowing developers to choose according to their needs. Moreover, as an open – source project, Opik welcomes contributions from developers to jointly promote the development of LLM evaluation technology.

We hope this article helps developers better understand Opik and make the most of its advantages in their actual projects. If you encounter any problems during use, you can refer to the relevant documentation or submit problem reports, and the Opik team will provide support and assistance in a timely manner.

LLM Evaluation Framework: Mastering Opik for AI Model Optimization