Skip to main content
Metadata represents the actual extracted values that result from applying Tags to your documents. While Tags define what to extract, Metadata is the extracted data itself.

Metadata Properties

PropertyDescriptionExample
ValuesThe extracted or classified value(s)["NDA", "Non-Disclosure Agreement"]
EvidenceText snippet supporting the extraction”This Non-Disclosure Agreement is entered into…”
ConfidenceAI confidence score (0-1)0.95
The Evidence field shows exactly where the AI found the information, making it easy to verify extractions and understand the source.

Metadata Levels

LevelDescriptionUse Case
File-LevelAggregated metadata for the entire documentDocument classification, search filters
Chunk-LevelGranular metadata per text segmentPrecise evidence location, RAG retrieval

Metadata Standardization

The platform includes AI-powered standardization to clean and normalize extracted values:
FeatureDescription
DeduplicationMerge similar values (e.g., “Inc.” and “Incorporated”)
NormalizationStandardize formats (dates, currencies, names)
Bulk StandardizationApply standardization across multiple tags
Standardization helps ensure consistency across your metadata, making it easier to search, filter, and analyze your documents.

How Metadata Generation Works

1

Document Processing

Documents are chunked and prepared for analysis.
2

Tag Application

The AI applies your Tags to extract or classify information from each chunk.
3

Evidence Capture

The system captures the text snippet that supports each extraction.
4

Aggregation

Chunk-level metadata is aggregated to create file-level metadata.
5

Standardization

Optional normalization and deduplication cleans the results.

Example Metadata Output

For a contract document with a “Contract Type” classification tag:
{
  "tag": "Contract Type",
  "values": ["NDA"],
  "evidence": "This Non-Disclosure Agreement ('Agreement') is entered into as of January 1, 2024...",
  "confidence": 0.97
}

Python SDK

from unstructured import UnstructuredClient

client = UnstructuredClient(
    username="your-username",
    password="your-password",
)

# Generate metadata for a single document
result = client.classify.generate(
    file_path="s3://my-bucket/contract.pdf",
    taxonomy_name="contract-analysis",
)

print(f"Document: {result.file_name}")
for tag, value in result.tags.items():
    print(f"  {tag}: {value}")

API Reference