This cookbook shows you how to configure sensitive data detection (PII) in your document processing pipeline. You’ll learn to automatically identify and flag personally identifiable information for compliance and data governance.Documentation Index
Fetch the complete documentation index at: https://docs.deasylabs.com/llms.txt
Use this file to discover all available pages before exploring further.
What You’ll Build
A pipeline that automatically detects and flags sensitive information in documents:Quick Start
Using Regex Patterns for PII
For precise PII detection, use regex patterns in your tags:Complete Compliance Pipeline
Here’s a production-ready pipeline for handling sensitive documents:PII Categories Reference
| Category | Examples | Typical Risk |
|---|---|---|
| Identity | SSN, Passport, Driver’s License | Critical |
| Financial | Credit Card, Bank Account, Tax ID | Critical |
| Health | Medical Records, Insurance ID, Diagnoses | High |
| Contact | Email, Phone, Address | Medium |
| Biometric | Fingerprints, Face Data, Voice | Critical |
| Demographic | Age, Gender, Ethnicity | Low-Medium |
Best Practices
Combine AI + Regex
Combine AI + Regex
Use AI-based detection for context-aware classification, and regex patterns for precise matching:
Audit Trail
Audit Trail
Keep metadata for compliance audits:
Incremental Scanning
Incremental Scanning
Process only new documents to save time:
Next Steps
S3 to SharePoint
Export enriched metadata to SharePoint.
S3 to Qdrant
Build a complete RAG pipeline.
Custom Taxonomies
Use AI to generate custom taxonomies.
Data Slices
Learn advanced filtering techniques.

