Tags are the metadata attributes you want to extract or classify from your documents. Taxonomies organize tags into hierarchical structures that define parent-child relationships.
What is a Tag?
A Tag defines a specific piece of information you want to capture from documents:
| Property | Description | Example |
|---|
| Name | The tag identifier | Contract Type |
| Description | Instructions for the AI on what to extract | ”Identify the type of legal agreement (NDA, MSA, SOW, etc.)” |
| Output Type | How values are returned | Word, Number, Date |
| Max Values | How many values get returned | 1 to however many relevant values an AI can find |
| Available Values | Predefined options (for classification) | ["NDA", "MSA", "SOW", "Employment Agreement"] |
| Strategy | Extraction method | LLM (AI), Regex (Pattern), Rule-based |
Tag Types
| Tag Type | How It Works | When to Use | Example |
|---|
| Classification Tags | AI chooses from predefined list of values | When you have a known set of categories | Document Type: Contract, Invoice, Report |
| Extraction Tags | AI extracts open-ended values from text | When the value is unpredictable | Contract Value: $1,500,000 |
| Pattern Tags | Regex pattern matching or NLP detection | For compliance, keyword-search, etc. | SSN: XXX-XX-XXXX, Email: [email protected] |
Classification is best when you have a known set of categories. Extraction is best for unpredictable values like names, dates, or amounts. Pattern is best for structured data like phone numbers or SSNs.
Taxonomy Structure
A Taxonomy enables hierarchical organization and conditional extraction. Child tags only get generated when parent conditions are met:
Document Type (Classification: Contract | Invoice | Report)
├── Contract
│ ├── Contract Value (Extraction)
│ ├── Parties Involved (Extraction)
│ ├── Effective Date (Extraction)
│ └── Termination Date (Extraction)
├── Invoice
│ ├── Invoice Amount (Extraction)
│ ├── Due Date (Extraction)
│ └── Vendor Name (Extraction)
└── Report
├── Report Category (Classification: Financial | Operational | Compliance)
└── Report Period (Extraction)
In this taxonomy, the AI first classifies the document type, then only extracts the relevant child tags. A Contract won’t have “Invoice Amount” extracted — saving time and cost.
Define Clear Descriptions
Give the AI specific instructions about what to extract. The better your description, the more accurate the extraction.
Choose the Right Type
Use Classification for known categories, Extraction for open-ended values, and Pattern for structured formats.
Set Available Values
For Classification tags, provide a complete list of possible values to improve accuracy.
Organize into Taxonomies
Group related tags hierarchically to enable conditional extraction and reduce unnecessary processing.
Python SDK
Create Taxonomy
Create Tag
AI Suggestions
List & Delete
from unstructured import UnstructuredClient
client = UnstructuredClient(
username="your-username",
password="your-password",
)
# Create a taxonomy with tags
taxonomy = client.taxonomy.upsert(
taxonomy_name="contract-analysis",
taxonomy_description="Extract key data from legal contracts",
tags=[
{
"name": "contract_type",
"description": "Type of contract (NDA, MSA, SLA, etc.)",
"output_type": "word",
},
{
"name": "parties",
"description": "Names of all parties involved",
"output_type": "list[string]",
},
{
"name": "effective_date",
"description": "When the contract becomes effective",
"output_type": "date",
},
{
"name": "total_value",
"description": "Total monetary value in USD",
"output_type": "float",
},
],
)
print(f"Created taxonomy: {taxonomy.taxonomy_id}")
# Create or update a single tag
tag = client.tags.upsert(
tag_name="document_type",
tag_description="Classify document type",
output_type="word",
available_values=["Contract", "Invoice", "Report", "Letter"],
)
print(f"Created tag: {tag.tag_id}")
# Get AI-suggested taxonomy for your domain
suggestions = client.suggest.schema(
description="Medical patient records with diagnoses and prescriptions",
)
print("Suggested tags:")
for tag in suggestions.tags:
print(f" - {tag.name}: {tag.description}")
# Get AI-suggested regex patterns
pattern = client.suggest.patterns(
description="US Social Security Number (XXX-XX-XXXX)",
)
print(f"Suggested pattern: {pattern.regex}")
# List all taxonomies
taxonomies = client.taxonomy.list()
for t in taxonomies.taxonomies:
print(f"{t.taxonomy_name}: {t.tag_count} tags")
# List all tags
tags = client.tags.list()
for tag in tags.tags:
print(f"{tag.tag_name} ({tag.output_type})")
# Delete a taxonomy
client.taxonomy.delete(taxonomy_name="old-taxonomy")
# Delete a tag
client.tags.delete(tag_name="unused-tag")
API Reference