Introduction

Getting started

You can use the Label Studio Python SDK to make annotating data a more integrated part of your data science and machine learning pipelines. This software development kit (SDK) lets you call the Label Studio API directly from scripts using predefined classes and methods.

With the Label Studio Python SDK, you can perform the following tasks in a Python script:

For additional guidance on using our SDK, see 5 Tips and Tricks for Label Studio’s API and SDK

Install

Install the Label Studio SDK using pip:

pip install label-studio-sdk

Authentication

In your Python script, do the following:

  • Import the SDK.
  • Define your API key and Label Studio URL (API key is available at Account page).
  • Connect to the API.
1# Define the URL where Label Studio is accessible and the API key for your user account
2LABEL_STUDIO_URL = 'http://localhost:8080'
3API_KEY = 'd6f8a2622d39e9d89ff0dfef1a80ad877f4ee9e3'
4
5# Import the SDK and the client module
6from label_studio_sdk.client import LabelStudio
7
8# Connect to the Label Studio API and check the connection
9ls = LabelStudio(base_url=LABEL_STUDIO_URL, api_key=API_KEY)

Create a Project

After you connect to the Label Studio API, you can create a project in Label Studio using the SDK. Specify the project title and the labeling configuration. Choose your labeling configuration based on the type of labeling that you wish to perform. See the available templates for Label Studio projects, or set a blank configuration with <View></View>.

For example, create a text classification project in your Python code:

1project = ls.projects.create(
2 title='Text Classification',
3 label_config='''
4 <View>
5 <Text name="text" value="$text" />
6 <Choices name="label" toName="text" choice="single">
7 <Choice value="Positive" />
8 <Choice value="Negative" />
9 </Choices>
10 </View>'''
11)

For more about what you can do with the project module of the SDK, see the project module SDK reference.

Import Tasks

You can import tasks from your script using the Label Studio Python SDK client.

For a specific project created, you can import tasks in Label Studio JSON format or connect to cloud storage providers and import image, audio, or video files directly.

Add Model Predictions

You can add predictions to existing tasks in Label Studio in your Python script.

For an existing simple image classification project, you can do the following to add predictions of “Dog” for image tasks that you retrieve:

1tasks = ls.tasks.list(project=project.id, include='id')
2for task in tasks:
3 ls.predictions.create(
4 task=task.id,
5 result=[{'from_name': 'label', 'to_name': 'text', 'type': 'choices', 'value': {'choices': ['Positive']}}],
6 score=0.99,
7 model_version='my-super-ai'
8 )

For another example, see the Jupyter notebook example of importing pre-annotated data.

The image is specified in the image key using a public URL, and the prediction is referenced in an arbitrary pet key, which is then specified in the preannotated_from_fields() method.

For more examples, see the Jupyter notebook example of importing pre-annotated data.

Managing Labeling Jobs

You can also use the SDK to control how tasks appear in the data manager to annotators or reviewers. You can create custom filters and ordering for the tasks based on parameters that you specify with the SDK. This lets you have more granular control over which tasks in your dataset get labeled or reviewed, and in which order.

Create a batch of tasks to annotate

For example, you can create a filter to prepare tasks to be annotated. For example, if you want annotators to focus on tasks in the first 1000 tasks in a dataset that contain the word “possum” in the field “text” in the task data, do the following:

1from label_studio_sdk.data_manager import Filters, Column, Type, Operator
2
3filters = Filters.create(Filters.AND, [
4 Filters.item(
5 Column.id,
6 Operator.GREATER_OR_EQUAL,
7 Type.Number,
8 Filters.value(1)
9 ),
10 Filters.item(
11 Column.id,
12 Operator.LESS_OR_EQUAL,
13 Type.Number,
14 Filters.value(1000)
15 ),
16 Filters.item(
17 Column.data("text"),
18 Operator.CONTAINS,
19 Type.String,
20 Filters.value("Hello")
21 )
22])

For example, to create a filter that displays only tasks with an ID greater than 42 or that were annotated between November 1, 2021, and now, do the following:

1from label_studio_sdk.data_manager import Filters, Column, Type, Operator
2
3filters = Filters.create(Filters.OR, [
4 Filters.item(
5 Column.id,
6 Operator.GREATER,
7 Type.Number,
8 Filters.value(42)
9 ),
10 Filters.item(
11 Column.completed_at,
12 Operator.IN,
13 Type.Datetime,
14 Filters.value(
15 datetime(2021, 11, 1),
16 datetime.now()
17 )
18 )
19])

Now you can create a view with the filter you created:

1view = ls.views.create(
2 project=project.id,
3 data={
4 'title': 'New tasks to annotate',
5 'filters': filters
6 }
7)
8tab = ls.views.get(id=view.id)

If will be displayed in the data manager as tab with New tasks to annotate name.

You can use this example filter to prepare completed tasks for review in Label Studio Enterprise.

Export Annotations

Run the following code to export annotations from project’s tab you created in the previous step:

1tasks = ls.tasks.list(view=tab.id, fields='all')
2for task in tasks:
3 # You can access annotations in Label Studio JSON format
4 print(task.annotations)
5 # And also annotation drafts and predictions
6 print(task.predictions)
7 print(task.drafts)

Read more about export formats in the Label Studio SDK documentation.

Handling Errors

If you encounter an error while using the Label Studio Python SDK, you can catch the error and handle it in your script.

1from label_studio_sdk.core.api_error import ApiError
2
3try:
4 for annotated_task in annotated_tasks:
5 print(annotated_task.annotations)
6except ApiError as e:
7 print(e)

Annotations are exported in the format specified in the Label Studio JSON format.