Taxonomies & Tags

Tags are the metadata attributes you want to extract or classify from your documents. Taxonomies organize tags into hierarchical structures that define parent-child relationships.

What is a Tag?

A Tag defines a specific piece of information you want to capture from documents:

Property	Description	Example
Name	The tag identifier	`Contract Type`
Description	Instructions for the AI on what to extract	”Identify the type of legal agreement (NDA, MSA, SOW, etc.)”
Output Type	How values are returned	Word, Number, Date
Max Values	How many values get returned	1 to however many relevant values an AI can find
Available Values	Predefined options (for classification)	`["NDA", "MSA", "SOW", "Employment Agreement"]`
Strategy	Extraction method	LLM (AI), Regex (Pattern), Rule-based

Tag Types

Tag Type	How It Works	When to Use	Example
Classification Tags	AI chooses from predefined list of values	When you have a known set of categories	Document Type: Contract, Invoice, Report
Extraction Tags	AI extracts open-ended values from text	When the value is unpredictable	Contract Value: $1,500,000
Pattern Tags	Regex pattern matching or NLP detection	For compliance, keyword-search, etc.	SSN: XXX-XX-XXXX, Email: [email protected]

Classification is best when you have a known set of categories. Extraction is best for unpredictable values like names, dates, or amounts. Pattern is best for structured data like phone numbers or SSNs.

Taxonomy Structure

A Taxonomy enables hierarchical organization and conditional extraction. Child tags only get generated when parent conditions are met:

Document Type (Classification: Contract | Invoice | Report)
├── Contract
│   ├── Contract Value (Extraction)
│   ├── Parties Involved (Extraction)
│   ├── Effective Date (Extraction)
│   └── Termination Date (Extraction)
├── Invoice
│   ├── Invoice Amount (Extraction)
│   ├── Due Date (Extraction)
│   └── Vendor Name (Extraction)
└── Report
    ├── Report Category (Classification: Financial | Operational | Compliance)
    └── Report Period (Extraction)

In this taxonomy, the AI first classifies the document type, then only extracts the relevant child tags. A Contract won’t have “Invoice Amount” extracted — saving time and cost.

Creating Effective Tags

Define Clear Descriptions

Give the AI specific instructions about what to extract. The better your description, the more accurate the extraction.

Choose the Right Type

Use Classification for known categories, Extraction for open-ended values, and Pattern for structured formats.

Set Available Values

For Classification tags, provide a complete list of possible values to improve accuracy.

Organize into Taxonomies

Group related tags hierarchically to enable conditional extraction and reduce unnecessary processing.

Python SDK

Create Taxonomy
Create Tag
AI Suggestions
List & Delete

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Create a taxonomy with tags
taxonomy = client.taxonomy.upsert(
    taxonomy_name="contract-analysis",
    taxonomy_description="Extract key data from legal contracts",
    tags=[
        {
            "name": "contract_type",
            "description": "Type of contract (NDA, MSA, SLA, etc.)",
            "output_type": "word",
        },
        {
            "name": "parties",
            "description": "Names of all parties involved",
            "output_type": "list[string]",
        },
        {
            "name": "effective_date",
            "description": "When the contract becomes effective",
            "output_type": "date",
        },
        {
            "name": "total_value",
            "description": "Total monetary value in USD",
            "output_type": "float",
        },
    ],
)
print(f"Created taxonomy: {taxonomy.taxonomy_id}")

# Create or update a single tag
tag = client.tags.upsert(
    tag_name="document_type",
    tag_description="Classify document type",
    output_type="word",
    available_values=["Contract", "Invoice", "Report", "Letter"],
)
print(f"Created tag: {tag.tag_id}")

# Get AI-suggested taxonomy for your domain
suggestions = client.suggest.schema(
    description="Medical patient records with diagnoses and prescriptions",
)

print("Suggested tags:")
for tag in suggestions.tags:
    print(f"  - {tag.name}: {tag.description}")

# Get AI-suggested regex patterns
pattern = client.suggest.patterns(
    description="US Social Security Number (XXX-XX-XXXX)",
)
print(f"Suggested pattern: {pattern.regex}")

# List all taxonomies
taxonomies = client.taxonomy.list()
for t in taxonomies.taxonomies:
    print(f"{t.taxonomy_name}: {t.tag_count} tags")

# List all tags
tags = client.tags.list()
for tag in tags.tags:
    print(f"{tag.tag_name} ({tag.output_type})")

# Delete a taxonomy
client.taxonomy.delete(taxonomy_name="old-taxonomy")

# Delete a tag
client.tags.delete(tag_name="unused-tag")

API Reference

Upsert Tag

Create or update a tag

List Tags

List all your tags

Delete Tag

Remove a tag

Suggest Patterns

AI-powered pattern suggestions

Upsert Taxonomy

Create or update a taxonomy

List Taxonomies

List all your taxonomies

Delete Taxonomy

Remove a taxonomy

Suggest Taxonomy

AI-powered taxonomy suggestions

Getting Started

Core Concepts

Cookbooks

What is a Tag?

Tag Types

Taxonomy Structure

Creating Effective Tags

Python SDK

API Reference

Upsert Tag

List Tags

Delete Tag

Suggest Patterns

Upsert Taxonomy

List Taxonomies

Delete Taxonomy

Suggest Taxonomy

Getting Started

Core Concepts

Cookbooks

​What is a Tag?

​Tag Types

​Taxonomy Structure

​Creating Effective Tags

​Python SDK

​API Reference

Upsert Tag

List Tags

Delete Tag

Suggest Patterns

Upsert Taxonomy

List Taxonomies

Delete Taxonomy

Suggest Taxonomy

What is a Tag?

Tag Types

Taxonomy Structure

Creating Effective Tags

Python SDK

API Reference