This cookbook walks you through building a production-ready RAG (Retrieval-Augmented Generation) pipeline. You’ll ingest documents from Amazon S3, extract structured metadata, and load everything into Qdrant for semantic search.Documentation Index
Fetch the complete documentation index at: https://docs.deasylabs.com/llms.txt
Use this file to discover all available pages before exploring further.
What You’ll Build
Prerequisites
- An S3 bucket with PDF documents
- A Qdrant instance (cloud or self-hosted)
- Python 3.9+
Complete Pipeline
Query Your RAG Pipeline
Once your data is in Qdrant, you can query it with any Qdrant client:Understanding Export Options
| Option | Description | When to Use |
|---|---|---|
export_level="file" | One record per document | Document-level retrieval |
export_level="chunk" | One record per chunk | RAG / semantic search |
export_level="both" | Both file and chunk records | Hybrid use cases |
export_nodes=True | Include vector embeddings | Required for semantic search |
metadata_format="json_store" | Metadata as JSON column | Flexible filtering |
metadata_format="column_store" | Metadata as separate columns | SQL-style queries |
Production Tips
Handle Large Batches
Handle Large Batches
For large document sets (100+ files), the export runs asynchronously. Poll the tracker:
Incremental Updates
Incremental Updates
Use Data Slices to process only new documents:
Error Handling
Error Handling
Wrap operations in try-except for production robustness:
Next Steps
S3 to SharePoint
Export metadata to SharePoint columns.
PII Detection
Add sensitive data detection to your pipeline.
Custom Taxonomies
Use AI to generate custom taxonomies.
Destinations
Explore all supported export targets.

