Ingest PDF documents into Quark’s vector store via the CLI or the REST API. Each document is chunked, embedded, and tagged with optional metadata.
Ingestion runs a document through the full pipeline: Unstructured.io partitions the text, pdfplumber extracts images, VoyageAI embeds each chunk, and the resulting vectors are upserted into Qdrant.
Ingestion requires a valid Unstructured.io API key and URL configured in your .env file. The pipeline exits with an error if this is not set. See Configuration for details.
The API ingestion flow is two steps: first obtain a presigned S3 upload URL, then trigger processing.
1
Get a presigned upload URL
POST /api/v1/ingest/upload/url
Quark returns { signedUrl, key }. Upload the file directly to signedUrl using a PUT request with the raw file bytes in the request body. Allowed types: application/pdf, image/jpeg, image/jpg. Maximum size: 50 MB.
2
Trigger processing
Once the file is uploaded, send the file key and optional tags to the processing endpoint: