Evaluate LLM responses

Use Label Studio UI for LLM evaluation

Connect to Label Studio

Let’s connect to the running Label Studio instance. You need API_KEY that can be found in Account & Settings -> API Key section.

1from label_studio_sdk.client import LabelStudio
3ls = LabelStudio(api_key='your-api-key')

Different LLM Evaluation Strategies

There are several strategies to evaluate LLM responses, depending on the complexity of the system and specific evaluation goals.

Create Evaluation Task

Picking one of the provided evaluation strategies, you can now upload your task to created Label Studio project:

2 data=task,
3 project=project.id

Now open the Label Studio UI and navigate to http://localhost:8080/projects/{project.id}/data?labeling=1 to start LLM evaluation.

Collect Annotated Data

The final step is to collect the annotated data from the Label Studio project. You can export the annotations in various formats like JSON, CSV, or directly to cloud storage providers.

You can also use a Python SDK to retrieve the annotations. For example, to collect and display all user choices from the project:

1annotated_tasks = ls.tasks.list(project=project.id, fields='all')
2evals = []
3for annotated_task in annotated_tasks:
4 evals.append(str(annotated_task.annotations[0].result[0]['value']['choices']))
6# display statistics
7from collections import Counter