Getting started | API Reference

Version 2.0.0 of the Label Studio SDK is coming soon, and will contain breaking changes. If you use the Label Studio SDK package in any automated pipelines, we strongly recommend pinning your SDK version to <2.0.0.

You can use the Label Studio Python SDK to make annotating data a more integrated part of your data science and machine learning pipelines. This software development kit (SDK) lets you call the Label Studio API directly from scripts using predefined classes and methods.

With the Label Studio Python SDK, you can perform the following tasks in a Python script:

Authenticate to the Label Studio API
Create a Label Studio project, including setting up a labeling configuration.
Import tasks.
Manage pre-annotated tasks and model predictions.
Connect to a cloud storage provider, such as Amazon S3, Microsoft Azure, or Google Cloud Services (GCS), to retrieve unlabeled tasks and store annotated tasks.
Manage labeling jobs by creating custom filters and ordering for tasks based on parameters that you specify.
Export annotations from Label Studio.

For additional guidance on using our SDK, see 5 Tips and Tricks for Label Studio’s API and SDK

Install

Install the Label Studio SDK using pip:

pip install label-studio-sdk

poetry add label-studio-sdk

Authentication

In your Python script, do the following:

Import the SDK.
Define your API key and Label Studio URL (API key is available at Account page).
Connect to the API.

1 # Define the URL where Label Studio is accessible and the API key for your user account
2 LABEL_STUDIO_URL = 'http://localhost:8080'
3 # API key is available at the Account & Settings > Access Tokens page in Label Studio UI
4 API_KEY = 'd6f8a2622d39e9d89ff0dfef1a80ad877f4ee9e3'
5 
6 # Import the SDK and the client module
7 from label_studio_sdk.client import LabelStudio
8 
9 # Connect to the Label Studio API and check the connection
10 ls = LabelStudio(base_url=LABEL_STUDIO_URL, api_key=API_KEY)

Create a Project

After you connect to the Label Studio API, you can create a project in Label Studio using the SDK. Specify the project title and the labeling configuration. Choose your labeling configuration based on the type of labeling that you wish to perform. See the available templates for Label Studio projects, or set a blank configuration with <View></View>.

For example, create a text classification project in your Python code:

1 from label_studio_sdk.label_interface import LabelInterface
2 from label_studio_sdk.label_interface.create import choices
3 
4 # Define labeling interface
5 label_config = LabelInterface.create({
6     'text': 'Text',
7     'label': choices(['Positive', 'Negative'])
8 })
9 
10 # Create a project with the specified title and labeling configuration
11 project = ls.projects.create(
12     title='Text Classification',
13     label_config=label_config
14 )

label_config is XML string that represents the labeling interface with object and control tags.

For more about what you can do with the project module of the SDK, see the project module SDK reference.

Import Tasks

You can import tasks from your script using the Label Studio Python SDK client.

For a specific project created, you can import tasks in Label Studio JSON format or connect to cloud storage providers and import image, audio, or video files directly.

Create a single task

You can import a single labeling task into Label Studio project:

1 ls.tasks.create(
2     project=project.id,
3     data={'text': 'Hello world'}
4 )

Create multiple tasks

To create multiple tasks at once in a project, use the method:

1 ls.projects.import_tasks(
2     id=project.id,
3     request=[
4         {"text": "Hello world"},
5         {"text": "Hello Label Studio"},
6         {"text": "What a beautiful day"},
7     ]
8 )

Create multiple tasks with preannotations

You can also import predictions together with tasks as pre-annotated tasks. The SDK offers several ways that you can import pre-annotations into Label Studio.

One way is to import tasks in a simple JSON format, where one key in the JSON identifies the data object being labeled, and the other is the key containing the prediction.

In this example, import predictions for an image classification task:

1 ls.projects.import_tasks(
2     id=project.id,
3     request=[
4         {"text": "Hello world", "sentiment": "Positive"},
5         {"text": "Goodbye Label Studio", "sentiment": "Negative"},
6         {"text": "What a beautiful day", "sentiment": "Positive"},
7     ],
8     preannotated_from_fields=['sentiment']
9 )

More customizable way to import preannotations:

1 from label_studio_sdk.label_interface.objects import PredictionValue
2 
3 # this returns the same `LabelInterface` object as above
4 li = ls.projects.get(id=project.id).get_label_interface()
5 
6 # by specifying what fields to `include` we can speed up task loading
7 for task in ls.tasks.list(project=project.id, include=["id"]):
8     task_id = task.id
9     prediction = PredictionValue(
10         # tag predictions with specific model version string
11         # it can help managing multiple models in Label Studio UI
12         model_version='my_model_v1',
13         # define your labels here
14         result=[
15             li.get_control('label').label(['Positive']),
16             # ... add more labels if needed
17         ]
18     )
19     ls.predictions.create(task=task_id, **prediction.model_dump())

Read more about importing pre-annotations in the Label Studio SDK documentation.

Add Model Predictions

You can add predictions to existing tasks in Label Studio in your Python script.

For an existing simple image classification project, you can do the following to add predictions of “Dog” for image tasks that you retrieve:

1 from label_studio_sdk.label_interface import LabelInterface
2 from label_studio_sdk.label_interface.objects import PredictionValue
3 
4 project = ls.projects.get(id=123)
5 
6 # LabelInterface provides a handy way to validate Label Studio JSON format for annotations and predictions
7 li = project.get_label_interface()
8 
9 tasks = ls.tasks.list(project=project.id, include='id')
10 for task in tasks:
11     # create predicted label per task, using `label` control tag name
12     predicted_label = li.get_control('label').label(choices=['Positive'])
13     prediction = PredictionValue(
14         model_version='my-super-ai',
15         score=0.99,
16         result=[predicted_label]
17     )
18     ls.predictions.create(task=task.id, **prediction.model_dump())

For another example, see the Jupyter notebook example of importing pre-annotated data.

Managing Labeling Jobs

You can also use the SDK to control how tasks appear in the data manager to annotators or reviewers. You can create custom filters and ordering for the tasks based on parameters that you specify with the SDK. This lets you have more granular control over which tasks in your dataset get labeled or reviewed, and in which order.

Create a batch of tasks to annotate

For example, you can create a filter to prepare tasks to be annotated. For example, if you want annotators to focus on tasks in the first 1000 tasks in a dataset that contain the word “possum” in the field “text” in the task data, do the following:

1 from label_studio_sdk.data_manager import Filters, Column, Type, Operator
2 
3 filters = Filters.create(Filters.AND, [
4     Filters.item(
5         Column.id,
6         Operator.GREATER_OR_EQUAL,
7         Type.Number,
8         Filters.value(1)
9     ),
10         Filters.item(
11         Column.id,
12         Operator.LESS_OR_EQUAL,
13         Type.Number,
14         Filters.value(1000)
15     ),
16     Filters.item(
17         Column.data("text"),
18         Operator.CONTAINS,
19         Type.String,
20         Filters.value("Hello")
21     )
22 ])

Most often it is very useful to create a view with annotated tasks to review:

1 from label_studio_sdk.data_manager import Filters, Column, Type, Operator
2 
3 filters = Filters.create(Filters.AND, [
4     Filters.item(
5         Column.completed_at,
6         Operator.EMPTY,
7         Type.Boolean,
8         Filters.value(False)
9     )
10 ])

To create a filtered tasks view, use the following code:

1 view = ls.views.create(
2     project=project.id,
3     data={
4         'title': 'Tasks Sample',
5         'filters': filters
6     }
7 )
8 tab = ls.views.get(id=view.id)

If will be displayed in the data manager as tab with Tasks Sample name.

Export Annotations

Run the following code to export annotations from project’s tab you created in the previous step:

1 tasks = ls.tasks.list(view=tab.id, fields='all')
2 for task in tasks:
3     # You can access annotations in Label Studio JSON format
4     print(task.annotations)
5     # And also annotation drafts and predictions
6     print(task.predictions)
7     print(task.drafts)

Export in bulk

1 data = ls.projects.exports.as_json(project.id)

Read more about export formats in the Label Studio SDK documentation.

SDK versions and compatibility

In June 2024, we released SDK 1.0. The previous SDK (version < 1) is deprecated and no longer supported. We recommend upgrading to the latest version.

If you still want to use the older version, you can install it using pip install "label-studio-sdk<1".

You can also check out an older branch version in the GitHub repository:

1 git clone https://github.com/HumanSignal/label-studio-sdk.git
2 cd label-studio-sdk
3 git fetch origin
4 git checkout release/0.0.34

Or you can simply modify you code to change the import stream as follows:

1 from label_studio_sdk import Client
2 from label_studio_sdk.data_manager import Filters, Column, Operator, Type
3 from label_studio_sdk._legacy import Project

If you’re looking for the documentation for the older version, you can find it here.

Advanced

Handling Errors

If you encounter an error while using the Label Studio Python SDK, you can catch the error and handle it in your script.

1 from label_studio_sdk.core.api_error import ApiError
2 
3 try:
4     for annotated_task in annotated_tasks:
5         print(annotated_task.annotations)
6 except ApiError as e:
7     print(e)

Annotations are exported in the format specified in the Label Studio JSON format.

Timeouts

By default, requests time out after 60 seconds. You can configure this with a timeout option at the client or request level.

1 from label_studio_sdk.client import LabelStudio
2 
3 ls = LabelStudio(
4     # All timeouts set to 20 seconds
5     timeout=20.0
6 )
7 
8 ls.projects.create(..., {
9     # Override timeout for a specific method
10     timeout=20.0
11 })

Custom HTTP client

You can override the httpx client to customize it for your use-case. Some common use-cases include support for proxies and transports.

1 import httpx
2 
3 from label_studio_sdk.client import LabelStudio
4 
5 ls = LabelStudio(
6     http_client=httpx.Client(
7         proxies="http://my.test.proxy.example.com",
8         transport=httpx.HTTPTransport(local_address="0.0.0.0"),
9     ),
10 )