Metadata represents the actual extracted values that result from applying Tags to your documents. While Tags define what to extract, Metadata is the extracted data itself.
| Property | Description | Example |
|---|
| Values | The extracted or classified value(s) | ["NDA", "Non-Disclosure Agreement"] |
| Evidence | Text snippet supporting the extraction | ”This Non-Disclosure Agreement is entered into…” |
| Confidence | AI confidence score (0-1) | 0.95 |
The Evidence field shows exactly where the AI found the information, making it easy to verify extractions and understand the source.
| Level | Description | Use Case |
|---|
| File-Level | Aggregated metadata for the entire document | Document classification, search filters |
| Chunk-Level | Granular metadata per text segment | Precise evidence location, RAG retrieval |
The platform includes AI-powered standardization to clean and normalize extracted values:
| Feature | Description |
|---|
| Deduplication | Merge similar values (e.g., “Inc.” and “Incorporated”) |
| Normalization | Standardize formats (dates, currencies, names) |
| Bulk Standardization | Apply standardization across multiple tags |
Standardization helps ensure consistency across your metadata, making it easier to search, filter, and analyze your documents.
Document Processing
Documents are chunked and prepared for analysis.
Tag Application
The AI applies your Tags to extract or classify information from each chunk.
Evidence Capture
The system captures the text snippet that supports each extraction.
Aggregation
Chunk-level metadata is aggregated to create file-level metadata.
Standardization
Optional normalization and deduplication cleans the results.
For a contract document with a “Contract Type” classification tag:
{
"tag": "Contract Type",
"values": ["NDA"],
"evidence": "This Non-Disclosure Agreement ('Agreement') is entered into as of January 1, 2024...",
"confidence": 0.97
}
Python SDK
API Reference