Upload a large file using multipart upload
Initiate a multipart upload for large files (typically >100MB). This will return presigned URLs for each part.
Each presigned URL is valid for 900 seconds (15 minutes) and can be used multiple times.
The workflow is:
- Call this endpoint to get presigned URLs for each part
- Upload each part to its respective presigned URL using PUT requests
- Call the complete multipart upload endpoint with all part ETags
ZIP archives are supported through this endpoint too — large ZIPs are a common use case for multipart upload. When fileType is application/zip, the server will extract the archive after the upload completes and treat each contained file as an independent document (OCR, classification, and webhook callbacks are emitted per extracted file). All extracted files share the same apiRequestId so you can correlate their per-file callbacks back to the original ZIP upload. Poll GET /v1/uploads/{uploadId} to see per-file status (HTTP 207 is returned when some files in the ZIP succeeded and others failed).
The input should contain:
fileName: the name of the file to be uploaded (include the.zipextension when uploading a ZIP archive)fileType: the MIME type of the file. Useapplication/zipfor ZIP archives.fileSize: the size of the file in MBpartSizeLimit: (optional) the size limit for each part in MBisSplit: whether the file should be split after upload (optional, default: false)isSplitExcel: whether to split Excel files by worksheets (optional, default: false)callbackURL: the url that will be called after processing (optional)ocrModel: the OCR model to use (optional)schemaLocking: whether the schema should be locked (optional)directoryId: the directory id where the file should be uploaded (optional)isEphemeral: whether the file and all related data should be deleted after the file is processed, must be one of true or false (optional, default: false)pageCount: page count of the file, used for early validation against page limits (optional). Not applicable to ZIP archives.apiRequestId: optional client-supplied correlation ID that groups files uploaded together into a logical batch. Persisted on every resulting document and forwarded asapi_request_idon every subsequent webhook callback, so you can reconcile processing events back to the originating request. For ZIP uploads, every extracted file inherits the sameapiRequestId. If omitted, the server auto-generates one (shared across all files extracted from the same ZIP).
How It Works
Authorizations
API key for authentication
Body
File name
"large-file.pdf"
File type
"application/pdf"
File size in MB
150.5
Part size limit in MB (optional, default will be calculated)
10
Is split
false
Is split excel - whether to split Excel files by worksheets
false
Callback URL
"https://example.com/callback"
OCR model
Beethoven_ENG_O5.6, Beethoven_ENG_G5.5, Beethoven_ENG_GP25, Beethoven_ENG_GP25.1, Beethoven_ENG_GP25.2, Beethoven_CUS_O5.1, Beethoven_CUS_O5.2, Beethoven_CUS_GP25.1, Unified (google-document-ai-ocr-gemini-v10), Beethoven_ZH_O5.9, Beethoven_JP_O5.3, Beethoven_JP_G5.4, Beethoven_TH_O5.1, Beethoven_TH_G5.1 "Beethoven_ENG_O5.6"
Schema locking
false
Directory Id
"649e2d2d2d2d2d2d2d2d2d2d"
Is ephemeral
false
Page count of the PDF file. Used for early validation against page limits.
50
Optional client-supplied correlation ID that ties files uploaded in the same logical batch together.
Behaviour:
- Persisted on the resulting document and forwarded as
api_request_idon all subsequent webhook callbacks for this upload, so you can correlate processing events back to the originating request on your side. - When the uploaded file is a ZIP archive (large zips are a common use case for multipart upload), every file extracted from it inherits this same value, letting you link all per-file callbacks back to the one ZIP upload.
- If omitted, the server auto-generates an opaque ID. For ZIP uploads specifically, the generated ID is shared across all extracted files.
Supply your own value when you already have an idempotency key or job ID on the client side (e.g. from your own queue) that you want to reconcile against inbound webhook events.
"my-batch-request-123"
Response
Multipart upload initiated successfully. Use the presigned URLs to upload each part.