> ## Documentation Index > Fetch the complete documentation index at: https://docs.deasylabs.com/llms.txt > Use this file to discover all available pages before exploring further. # Quickstart > Go from zero to extracting metadata in under 5 minutes Get started with Unstructured by Collibra by running this complete example. You'll connect to a data source, define what metadata to extract, and see results in minutes. ## Installation Download and install the Python SDK wheel file: Contact us to get the SDK wheel file ```bash theme={null} pip install unstructured_sdk-*.whl ``` ## Supported Filetypes Unstructured by Collibra can process a wide variety of document formats: .pdf .docx .xls, .xlsx, .xlsm, .xlsb .ppt, .pptx .odf, .ods, .odt .json, .csv Any other file (UTF-8) ## Language Support | Capability | Supported Languages | | :---------------------- | :------------------------------------------- | | **Document Processing** | Multilingual (all UTF-8 supported languages) | | **LLM Classification** | Multilingual (dependent on model) | | **PII Detection** | English only | ## Complete Example Copy and run this script to extract metadata from your documents: ```python theme={null} from unstructured import UnstructuredClient # 1. Initialize the client (basic auth) client = UnstructuredClient( base_url="https://unstructured.your-company.com/rest/unstructured", username="your-username", password="your-password", ) # Alternatively, authenticate with an API token issued from the web UI: # client = UnstructuredClient( # base_url="https://unstructured.your-company.com/rest/unstructured", # api_token="your-api-token", # user_id="your-username", # sent as the X-User-ID header # ) # 2. Create a data connector (S3 example) connector = client.data_source.create( connector_name="my-s3-bucket", connector_body={ "vector_db_type": "s3", "bucket_name": "my-documents", "aws_access_key_id": "YOUR_ACCESS_KEY", "aws_secret_access_key": "YOUR_SECRET_KEY", "region": "us-east-1", }, ) print(f"✓ Created connector: {connector.profile_id}") # 3. Define a taxonomy with tags taxonomy = client.taxonomy.upsert( taxonomy_name="document-classification", taxonomy_description="Classify and extract key info from documents", tags=[ { "name": "document_type", "description": "Type of document (invoice, contract, report, etc.)", "output_type": "word", }, { "name": "summary", "description": "A brief 2-3 sentence summary of the document", "output_type": "string", }, { "name": "key_dates", "description": "Important dates mentioned in the document", "output_type": "list[date]", }, ], ) print(f"✓ Created taxonomy: {taxonomy.taxonomy_id}") # 4. Extract metadata from documents results = client.classify.generate_batch( connector_name="my-s3-bucket", taxonomy_name="document-classification", ) # 5. View the results for result in results.metadata: print(f"\nFile: {result.file_name}") print(f" Type: {result.tags.get('document_type')}") print(f" Summary: {result.tags.get('summary')}") print(f" Key Dates: {result.tags.get('key_dates')}") ``` **About client configuration** * **`base_url`** is required and points to your Unstructured deployment (e.g. `https://unstructured.your-company.com/rest/unstructured`). There is no default. * Any constructor argument can be set via an environment variable instead: `UNSTRUCTURED_CLIENT_BASE_URL`, `UNSTRUCTURED_USERNAME`, `UNSTRUCTURED_PASSWORD`, `UNSTRUCTURED_API_TOKEN`, `UNSTRUCTURED_USER_ID`. * **API tokens** are long-lived and issued from your Unstructured deployment's web UI. The SDK does not create, refresh, or revoke them. ## What Just Happened? The Data Connector established a secure connection to your S3 bucket, allowing the platform to read your documents. The Taxonomy and Tags told the platform what information to look for — document type, summary, and key dates. The platform's AI analyzed each document and extracted the structured metadata you defined. ## Next Steps Learn how Data Connectors, Taxonomies, and Metadata work together. Export enriched metadata to SharePoint. Explore all available endpoints and SDK methods. Set up sensitive data detection for compliance.