Process documents with OCR and ingest them into Unstructured.
This endpoint performs optical character recognition on documents and stores the extracted data.
| Field | Type | Description |
|---|---|---|
data_connector_name | str | Name of the data connector to use. |
file_names | List[str] | Specific files to process. If omitted, processes all. |
job_id | str | Custom job ID for tracking. Auto-generated if not provided. |
clean_up_out_of_sync | bool | Remove files from VDB not in source. Default: true. |
file_count_to_run | int | Limit number of files to process. |
use_llm | bool | Use LLM for enhanced extraction. Default: false. |
{
"data_connector_name": "my-documents",
"use_llm": true,
"clean_up_out_of_sync": true,
"file_count_to_run": 100
}