Skip to main content
Data Connectors establish secure connections between the platform and your document storage. Once connected, the platform can discover documents, read content for metadata extraction, and write enriched metadata back to the source.

Supported Data Connectors

Amazon S3

Connect to AWS S3 buckets for scalable cloud storage.

SharePoint

Integrate with Microsoft 365 document libraries.

PostgreSQL

Connect to PostgreSQL databases with pgvector support.

Qdrant

Vector database integration for semantic search.

Configuration Details

Source TypeDescriptionKey ConfigurationIdeal Use Case
Amazon S3AWS cloud object storageBucket name, Access Key, Secret KeyLarge-scale document archives, cloud-native workflows
SharePointMicrosoft 365 document managementClient ID, Client Secret, Tenant ID, Site NameEnterprise document libraries, Office 365 environments
PostgreSQLRelational database with pgvector extensionHost URL, Database name, User credentials, PortStructured + unstructured hybrid data, existing database workflows
QdrantPurpose-built vector database for AIAPI Key, Collection name, URLSemantic search applications, RAG pipelines

Key Features

Multiple Profiles

Create and manage multiple Data Connector connections

Connection Testing

Validate credentials before saving

Active Profile Selection

Switch between Data Connectors with one click

Schema Configuration

Customize field mappings (filename key, text key, tags key)

How Data Connectors Work

1

Create a Connector

Select your storage type and provide the required credentials.
2

Test the Connection

Validate that the platform can access your documents before saving.
3

Configure Schema Mapping

Map your data fields (filename, text content, tags) to the platform’s expected format.
4

Start Processing

Your documents are now available for metadata extraction.

Python SDK

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Create an S3 connector
connector = client.data_source.create(
    connector_name="my-s3-bucket",
    connector_body={
        "vector_db_type": "s3",
        "bucket_name": "my-documents",
        "aws_access_key_id": "YOUR_ACCESS_KEY",
        "aws_secret_access_key": "YOUR_SECRET_KEY",
        "region": "us-east-1",
    },
)
print(f"Created connector: {connector.profile_id}")

API Reference