Skip to main content
This cookbook shows you how to use the platform’s AI to automatically generate custom taxonomies. Instead of manually defining tags one by one, you can describe your use case in natural language, and the AI will build a complete taxonomy for you.

AI-Generated Taxonomies

The suggest feature allows you to bootstrap complex taxonomies in seconds:
from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Describe what you want to extract
suggestions = client.taxonomy.suggest(
    user_context="""
    I need to analyze legal contracts. Extract key terms, financial obligations, 
    risk indicators, important dates, and anything relevant for compliance.
    Focus on NDAs, MSAs, and employment agreements.
    """,
    data_connector_name="my-documents", # Optional: use documents to ground suggestions
)

# Review the suggestions
print("🤖 AI-Suggested Taxonomy:")
print(suggestions.suggestion)
Example output:
🤖 AI-Suggested Taxonomy:
{
  "contract_type": {
    "description": "Primary contract classification: NDA, MSA, SLA, SOW, Employment, Lease",
    "type": "word"
  },
  "parties": {
    "description": "All parties to the contract with their legal names",
    "type": "list[string]"
  },
  "effective_date": {
    "description": "Date when the contract becomes legally binding",
    "type": "date"
  },
  "total_value": {
    "description": "Total monetary value of the contract in USD",
    "type": "float"
  }
}

Create the Taxonomy

Once you have the suggestions, you can create the taxonomy directly:
# Convert suggestions to tags list
tags = []
for name, details in suggestions.suggestion.items():
    tags.append({
        "name": name,
        "description": details["description"],
        "output_type": details["type"]
    })

# Create taxonomy from suggestions
taxonomy = client.taxonomy.upsert(
    taxonomy_name="legal-contracts",
    taxonomy_description="AI-generated taxonomy for contract analysis",
    tags=tags,
)

print(f"✓ Created taxonomy with {len(tags)} tags")

Refine with Sample Documents

For higher accuracy, point to specific files in your data connector. The AI will analyze these documents to suggest relevant tags:
# The AI analyzes your files to suggest the most relevant tags
suggestions = client.taxonomy.suggest(
    user_context="Extract key data from these vendor invoices",
    data_connector_name="my-s3-bucket",
    file_names=[
        "invoices/sample-invoice-1.pdf",
        "invoices/sample-invoice-2.pdf"
    ]
)

Customize the Results

You can modify the AI suggestions before creating the taxonomy:
# Get suggestions
response = client.taxonomy.suggest(
    user_context="Legal contract analysis", 
    data_connector_name="my-docs"
)

# Convert to list for editing
tags = []
for name, details in response.suggestion.items():
    tags.append({
        "name": name, 
        "description": details["description"],
        "output_type": details["type"]
    })

# Add a custom tag the AI might have missed
tags.append({
    "name": "reviewed_by_legal",
    "description": "Whether this contract has been reviewed",
    "output_type": "boolean",
})

# Create the customized taxonomy
client.taxonomy.upsert(
    taxonomy_name="legal-contracts-custom",
    tags=tags,
)

Common Use Cases

client.taxonomy.suggest(
    user_context="""
    Analyze quarterly financial reports. Extract revenue, earnings, 
    growth metrics, risk factors, and forward guidance.
    """,
    data_connector_name="finance-docs"
)

Next Steps