Quickstart

Get started with Unstructured by Collibra by running this complete example. You’ll connect to a data source, define what metadata to extract, and see results in minutes.

Installation

Install the Python SDK:

pip install unstructured-sdk

Complete Example

Copy and run this script to extract metadata from your documents:

from unstructured import UnstructuredClient

# 1. Initialize the client
client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# 2. Create a data connector (S3 example)
connector = client.data_source.create(
    connector_name="my-s3-bucket",
    connector_body={
        "vector_db_type": "s3",
        "bucket_name": "my-documents",
        "aws_access_key_id": "YOUR_ACCESS_KEY",
        "aws_secret_access_key": "YOUR_SECRET_KEY",
        "region": "us-east-1",
    },
)
print(f"✓ Created connector: {connector.profile_id}")

# 3. Define a taxonomy with tags
taxonomy = client.taxonomy.upsert(
    taxonomy_name="document-classification",
    taxonomy_description="Classify and extract key info from documents",
    tags=[
        {
            "name": "document_type",
            "description": "Type of document (invoice, contract, report, etc.)",
            "output_type": "word",
        },
        {
            "name": "summary",
            "description": "A brief 2-3 sentence summary of the document",
            "output_type": "string",
        },
        {
            "name": "key_dates",
            "description": "Important dates mentioned in the document",
            "output_type": "list[date]",
        },
    ],
)
print(f"✓ Created taxonomy: {taxonomy.taxonomy_id}")

# 4. Extract metadata from documents
results = client.classify.generate_batch(
    connector_name="my-s3-bucket",
    taxonomy_name="document-classification",
)

# 5. View the results
for result in results.metadata:
    print(f"\nFile: {result.file_name}")
    print(f"  Type: {result.tags.get('document_type')}")
    print(f"  Summary: {result.tags.get('summary')}")
    print(f"  Key Dates: {result.tags.get('key_dates')}")

What Just Happened?

Connected to Your Data

The Data Connector established a secure connection to your S3 bucket, allowing the platform to read your documents.

Defined What to Extract

The Taxonomy and Tags told the platform what information to look for — document type, summary, and key dates.

Extracted Metadata

The platform’s AI analyzed each document and extracted the structured metadata you defined.

Next Steps

Explore Concepts

Learn how Data Connectors, Taxonomies, and Metadata work together.

S3 to SharePoint

Export enriched metadata to SharePoint.

S3 to Qdrant

Build a RAG pipeline with vector search.

PII Detection

Set up sensitive data detection for compliance.

Development Setup

Want to contribute to the docs? Here’s how to run them locally:

Local Preview

Install the Mintlify CLI:

pnpm add -g mintlify

Run the development server:

mintlify dev

Getting Started

Core Concepts

Cookbooks

Installation

Complete Example

What Just Happened?

Next Steps

Explore Concepts

S3 to SharePoint

S3 to Qdrant

PII Detection

Development Setup

Getting Started

Core Concepts

Cookbooks

​Installation

​Complete Example

​What Just Happened?

​Next Steps

Explore Concepts

S3 to SharePoint

S3 to Qdrant

PII Detection

​Development Setup

Installation

Complete Example

What Just Happened?

Next Steps

Development Setup