Documentation Index Fetch the complete documentation index at: https://docs.deasylabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Data Slices are filtered subsets of your data based on metadata. They allow you to focus on specific segments of your document repository without affecting the entire dataset.
Benefits
Benefit Description Targeted Processing Run extraction on only relevant documents (e.g., just 2024 contracts) Focused Analysis View and analyze specific document categories Efficient Workflows Avoid reprocessing already-enriched documents Team Collaboration Share team-specific data slices with team members Controllability Run downstream applications on controlled data
Data Slice Properties
Property Description Name User-defined identifier Description Optional notes about what’s included Data Connector Which Data Connector it’s derived from Document Count Number of files matching the conditions Conditions Filter rules that define the slice based on metadata
How Data Slices Work
Example Data Slices
Slice Name Filter Conditions Document Count ”2024 Contracts” Document Type = Contract AND Year = 2024 1,247 ”High-Value Invoices” Invoice Amount > $10,000 89 ”Documents with PII” SSN_Detected = Yes OR Email_Detected = Yes 3,521 ”Unprocessed Files” Document Type = null 5,892
Use Data Slices to create focused workflows. For example, process only untagged documents to avoid reprocessing, or create a slice of high-priority documents for immediate attention.
Creating Effective Data Slices
Define Your Goal
Identify what subset of documents you need to work with.
Choose Filter Conditions
Select metadata fields and values that define your target documents.
Combine Conditions
Use AND/OR logic to create precise filters.
Verify Document Count
Check that the slice captures the expected number of documents.
Apply to Workflows
Use the slice in Projects or for targeted exports.
Common Use Cases
Incremental Processing Filter to documents that haven’t been processed yet
Compliance Review Focus on documents containing sensitive information
Time-Based Analysis Analyze documents from specific time periods
Category Deep-Dive Examine all documents of a particular type
Python SDK
Create Data Slice
Filter Examples
Export Slice
List & Delete
from unstructured import UnstructuredClient
client = UnstructuredClient(
username = "your-username" ,
password = "your-password" ,
)
# Create a data slice with filter conditions
slice = client.dataslice.create(
dataslice_name = "2024-contracts" ,
connector_name = "my-s3-bucket" ,
conditions = [
{ "field" : "document_type" , "operator" : "eq" , "value" : "Contract" },
{ "field" : "year" , "operator" : "eq" , "value" : 2024 },
],
)
print ( f "Created slice with { slice .document_count } documents" )
# Filter by metadata values
high_value = client.dataslice.create(
dataslice_name = "high-value-contracts" ,
connector_name = "my-s3-bucket" ,
conditions = [
{ "field" : "contract_value" , "operator" : "gte" , "value" : 100000 },
],
)
# Filter unprocessed documents
unprocessed = client.dataslice.create(
dataslice_name = "needs-processing" ,
connector_name = "my-s3-bucket" ,
conditions = [
{ "field" : "document_type" , "operator" : "is_null" , "value" : True },
],
)
# Filter documents with PII
sensitive = client.dataslice.create(
dataslice_name = "contains-pii" ,
connector_name = "my-s3-bucket" ,
conditions = [
{ "field" : "has_ssn" , "operator" : "eq" , "value" : True },
],
)
# Export a data slice to a destination
result = client.dataslice.export_metadata(
dataslice_name = "2024-contracts" ,
export_format = "csv" ,
)
print ( f "Exported to: { result.file_path } " )
# Export to a vector database
client.destination.export(
destination_name = "my-qdrant" ,
dataslice_name = "2024-contracts" ,
export_level = "chunk" ,
export_nodes = True ,
)
# List all data slices
slices = client.dataslice.list()
for s in slices.dataslices:
print ( f " { s.dataslice_name } : { s.document_count } docs" )
# Delete a data slice
client.dataslice.delete( dataslice_name = "old-slice" )
print ( "Data slice deleted" )
API Reference
Create Data Slice Create a new data slice
List Data Slices List all your data slices
Delete Data Slice Remove a data slice
Export Data Slice Export data from a slice