Skip to main content
Data Slices are filtered subsets of your data based on metadata. They allow you to focus on specific segments of your document repository without affecting the entire dataset.

Benefits

BenefitDescription
Targeted ProcessingRun extraction on only relevant documents (e.g., just 2024 contracts)
Focused AnalysisView and analyze specific document categories
Efficient WorkflowsAvoid reprocessing already-enriched documents
Team CollaborationShare team-specific data slices with team members
ControllabilityRun downstream applications on controlled data

Data Slice Properties

PropertyDescription
NameUser-defined identifier
DescriptionOptional notes about what’s included
Data ConnectorWhich Data Connector it’s derived from
Document CountNumber of files matching the conditions
ConditionsFilter rules that define the slice based on metadata

How Data Slices Work

Example Data Slices

Slice NameFilter ConditionsDocument Count
”2024 Contracts”Document Type = Contract AND Year = 20241,247
”High-Value Invoices”Invoice Amount > $10,00089
”Documents with PII”SSN_Detected = Yes OR Email_Detected = Yes3,521
”Unprocessed Files”Document Type = null5,892
Use Data Slices to create focused workflows. For example, process only untagged documents to avoid reprocessing, or create a slice of high-priority documents for immediate attention.

Creating Effective Data Slices

1

Define Your Goal

Identify what subset of documents you need to work with.
2

Choose Filter Conditions

Select metadata fields and values that define your target documents.
3

Combine Conditions

Use AND/OR logic to create precise filters.
4

Verify Document Count

Check that the slice captures the expected number of documents.
5

Apply to Workflows

Use the slice in Projects or for targeted exports.

Common Use Cases

Incremental Processing

Filter to documents that haven’t been processed yet

Compliance Review

Focus on documents containing sensitive information

Time-Based Analysis

Analyze documents from specific time periods

Category Deep-Dive

Examine all documents of a particular type

Python SDK

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Create a data slice with filter conditions
slice = client.dataslice.create(
    dataslice_name="2024-contracts",
    connector_name="my-s3-bucket",
    conditions=[
        {"field": "document_type", "operator": "eq", "value": "Contract"},
        {"field": "year", "operator": "eq", "value": 2024},
    ],
)
print(f"Created slice with {slice.document_count} documents")

API Reference