Data Slices

Data Slices are filtered subsets of your data based on metadata. They allow you to focus on specific segments of your document repository without affecting the entire dataset.

Benefits

Benefit	Description
Targeted Processing	Run extraction on only relevant documents (e.g., just 2024 contracts)
Focused Analysis	View and analyze specific document categories
Efficient Workflows	Avoid reprocessing already-enriched documents
Team Collaboration	Share team-specific data slices with team members
Controllability	Run downstream applications on controlled data

Data Slice Properties

Property	Description
Name	User-defined identifier
Description	Optional notes about what’s included
Data Connector	Which Data Connector it’s derived from
Document Count	Number of files matching the conditions
Conditions	Filter rules that define the slice based on metadata

How Data Slices Work

Example Data Slices

Slice Name	Filter Conditions	Document Count
”2024 Contracts”	Document Type = Contract AND Year = 2024	1,247
”High-Value Invoices”	Invoice Amount > $10,000	89
”Documents with PII”	SSN_Detected = Yes OR Email_Detected = Yes	3,521
”Unprocessed Files”	Document Type = null	5,892

Use Data Slices to create focused workflows. For example, process only untagged documents to avoid reprocessing, or create a slice of high-priority documents for immediate attention.

Creating Effective Data Slices

Define Your Goal

Identify what subset of documents you need to work with.

Choose Filter Conditions

Select metadata fields and values that define your target documents.

Combine Conditions

Use AND/OR logic to create precise filters.

Verify Document Count

Check that the slice captures the expected number of documents.

Apply to Workflows

Use the slice in Projects or for targeted exports.

Common Use Cases

Incremental Processing

Filter to documents that haven’t been processed yet

Compliance Review

Focus on documents containing sensitive information

Time-Based Analysis

Analyze documents from specific time periods

Category Deep-Dive

Examine all documents of a particular type

Python SDK

Create Data Slice
Filter Examples
Export Slice
List & Delete

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Create a data slice with filter conditions
slice = client.dataslice.create(
    dataslice_name="2024-contracts",
    connector_name="my-s3-bucket",
    conditions=[
        {"field": "document_type", "operator": "eq", "value": "Contract"},
        {"field": "year", "operator": "eq", "value": 2024},
    ],
)
print(f"Created slice with {slice.document_count} documents")

# Filter by metadata values
high_value = client.dataslice.create(
    dataslice_name="high-value-contracts",
    connector_name="my-s3-bucket",
    conditions=[
        {"field": "contract_value", "operator": "gte", "value": 100000},
    ],
)

# Filter unprocessed documents
unprocessed = client.dataslice.create(
    dataslice_name="needs-processing",
    connector_name="my-s3-bucket",
    conditions=[
        {"field": "document_type", "operator": "is_null", "value": True},
    ],
)

# Filter documents with PII
sensitive = client.dataslice.create(
    dataslice_name="contains-pii",
    connector_name="my-s3-bucket",
    conditions=[
        {"field": "has_ssn", "operator": "eq", "value": True},
    ],
)

# Export a data slice to a destination
result = client.dataslice.export_metadata(
    dataslice_name="2024-contracts",
    export_format="csv",
)
print(f"Exported to: {result.file_path}")

# Export to a vector database
client.destination.export(
    destination_name="my-qdrant",
    dataslice_name="2024-contracts",
    export_level="chunk",
    export_nodes=True,
)

# List all data slices
slices = client.dataslice.list()
for s in slices.dataslices:
    print(f"{s.dataslice_name}: {s.document_count} docs")

# Delete a data slice
client.dataslice.delete(dataslice_name="old-slice")
print("Data slice deleted")

API Reference

Create Data Slice

Create a new data slice

List Data Slices

List all your data slices

Delete Data Slice

Remove a data slice

Export Data Slice

Export data from a slice

Getting Started

Core Concepts

Cookbooks

Benefits

Data Slice Properties

How Data Slices Work

Example Data Slices

Creating Effective Data Slices

Common Use Cases

Incremental Processing

Compliance Review

Time-Based Analysis

Category Deep-Dive

Python SDK

API Reference

Create Data Slice

List Data Slices

Delete Data Slice

Export Data Slice

Getting Started

Core Concepts

Cookbooks

​Benefits

​Data Slice Properties

​How Data Slices Work

​Example Data Slices

​Creating Effective Data Slices

​Common Use Cases

Incremental Processing

Compliance Review

Time-Based Analysis

Category Deep-Dive

​Python SDK

​API Reference

Create Data Slice

List Data Slices

Delete Data Slice

Export Data Slice

Benefits

Data Slice Properties

How Data Slices Work

Example Data Slices

Creating Effective Data Slices

Common Use Cases

Python SDK

API Reference