Skip to main content
Tags are the metadata attributes you want to extract or classify from your documents. Taxonomies organize tags into hierarchical structures that define parent-child relationships.

What is a Tag?

A Tag defines a specific piece of information you want to capture from documents:
PropertyDescriptionExample
NameThe tag identifierContract Type
DescriptionInstructions for the AI on what to extract”Identify the type of legal agreement (NDA, MSA, SOW, etc.)”
Output TypeHow values are returnedWord, Number, Date
Max ValuesHow many values get returned1 to however many relevant values an AI can find
Available ValuesPredefined options (for classification)["NDA", "MSA", "SOW", "Employment Agreement"]
StrategyExtraction methodLLM (AI), Regex (Pattern), Rule-based

Tag Types

Tag TypeHow It WorksWhen to UseExample
Classification TagsAI chooses from predefined list of valuesWhen you have a known set of categoriesDocument Type: Contract, Invoice, Report
Extraction TagsAI extracts open-ended values from textWhen the value is unpredictableContract Value: $1,500,000
Pattern TagsRegex pattern matching or NLP detectionFor compliance, keyword-search, etc.SSN: XXX-XX-XXXX, Email: [email protected]
Classification is best when you have a known set of categories. Extraction is best for unpredictable values like names, dates, or amounts. Pattern is best for structured data like phone numbers or SSNs.

Taxonomy Structure

A Taxonomy enables hierarchical organization and conditional extraction. Child tags only get generated when parent conditions are met:
Document Type (Classification: Contract | Invoice | Report)
├── Contract
│   ├── Contract Value (Extraction)
│   ├── Parties Involved (Extraction)
│   ├── Effective Date (Extraction)
│   └── Termination Date (Extraction)
├── Invoice
│   ├── Invoice Amount (Extraction)
│   ├── Due Date (Extraction)
│   └── Vendor Name (Extraction)
└── Report
    ├── Report Category (Classification: Financial | Operational | Compliance)
    └── Report Period (Extraction)
In this taxonomy, the AI first classifies the document type, then only extracts the relevant child tags. A Contract won’t have “Invoice Amount” extracted — saving time and cost.

Creating Effective Tags

1

Define Clear Descriptions

Give the AI specific instructions about what to extract. The better your description, the more accurate the extraction.
2

Choose the Right Type

Use Classification for known categories, Extraction for open-ended values, and Pattern for structured formats.
3

Set Available Values

For Classification tags, provide a complete list of possible values to improve accuracy.
4

Organize into Taxonomies

Group related tags hierarchically to enable conditional extraction and reduce unnecessary processing.

Python SDK

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Create a taxonomy with tags
taxonomy = client.taxonomy.upsert(
    taxonomy_name="contract-analysis",
    taxonomy_description="Extract key data from legal contracts",
    tags=[
        {
            "name": "contract_type",
            "description": "Type of contract (NDA, MSA, SLA, etc.)",
            "output_type": "word",
        },
        {
            "name": "parties",
            "description": "Names of all parties involved",
            "output_type": "list[string]",
        },
        {
            "name": "effective_date",
            "description": "When the contract becomes effective",
            "output_type": "date",
        },
        {
            "name": "total_value",
            "description": "Total monetary value in USD",
            "output_type": "float",
        },
    ],
)
print(f"Created taxonomy: {taxonomy.taxonomy_id}")

API Reference