Installation
Download and install the Python SDK wheel file:Request Access
Contact us to get the SDK wheel file
Supported Filetypes
Unstructured by Collibra can process a wide variety of document formats:.pdf
Microsoft Word
.docx
Microsoft Excel
.xls, .xlsx, .xlsm, .xlsb
Microsoft PowerPoint
.ppt, .pptx
OpenDocument
.odf, .ods, .odt
Data Formats
.json, .csv
Plain Text
Any other file (UTF-8)
Language Support
| Capability | Supported Languages |
|---|---|
| Document Processing | Multilingual (all UTF-8 supported languages) |
| LLM Classification | Multilingual (dependent on model) |
| PII Detection | English only |
Complete Example
Copy and run this script to extract metadata from your documents:What Just Happened?
Connected to Your Data
The Data Connector established a secure connection to your S3 bucket, allowing the platform to read your documents.
Defined What to Extract
The Taxonomy and Tags told the platform what information to look for — document type, summary, and key dates.
Next Steps
Explore Concepts
Learn how Data Connectors, Taxonomies, and Metadata work together.
S3 to SharePoint
Export enriched metadata to SharePoint.
API Reference
Explore all available endpoints and SDK methods.
PII Detection
Set up sensitive data detection for compliance.

