> ## Documentation Index
> Fetch the complete documentation index at: https://docs.file.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload a file

> Upload a file to the server, this will return a presigned upload url to be used for the upload.

The presignedUploadURL is valid for 300 seconds (5 minutes) and can be used multiple times.

**ZIP archives are supported.** When `fileType` is `application/zip`, the server will extract the archive after upload and treat each contained file as an independent document (OCR, classification, and webhook callbacks are emitted per extracted file). All extracted files share the same `apiRequestId` so you can correlate their per-file callbacks back to the original ZIP upload. Poll `GET /v1/uploads/{uploadId}` to see per-file status (HTTP 207 is returned when some files in the ZIP succeeded and others failed).

The input should contain the following information:

- `fileName`: the name of the file to be uploaded (include the `.zip` extension when uploading a ZIP archive)

- `fileType`: the MIME type of the file to be uploaded. Use `application/zip` for ZIP archives.

- `isSplit`: whether the file is a split file or not (optional, default: false)

- `isSplitExcel`: whether to split Excel files by worksheets (optional, default: false)

- `callbackURL`: (optional) the URL that will be called after file processing. Must be a valid HTTPS URL.
  - If provided, this URL takes precedence over the API key's default callback URL
  - If not provided, the API key's default callback URL will be used (if configured)

- `ocrModel`: the OCR model to be used for file processing (optional). Available models:
  - **English Models:**
    - `Beethoven_ENG_O5.6` - OpenAI v6
    - `Beethoven_ENG_G5.5` - Gemini v5
    - `Beethoven_ENG_GP25` - Gemini Pro 2.5
    - `Beethoven_ENG_GP25.1` - Gemini Pro 2.5 v1
    - `Beethoven_ENG_GP25.2` - Gemini Pro 2.5 PDF
    - `Beethoven_CUS_O5.1` - Custom OpenAI v8
    - `Beethoven_CUS_O5.2` - Custom Gemini v13
    - `Unified (google-document-ai-ocr-gemini-v10)` - Unified model
  - **Chinese Models:**
    - `Beethoven_ZH_O5.9` - Chinese OpenAI v9
  - **Japanese Models:**
    - `Beethoven_JP_O5.3` - Japanese OpenAI v3
    - `Beethoven_JP_G5.4` - Japanese Gemini fine-tuned
  - **Thai Models:**
    - `Beethoven_TH_O5.1` - Thai OpenAI v1
    - `Beethoven_TH_G5.1` - Thai Gemini v1

- `schemaLocking`: whether the schema should be locked after the file is uploaded, must be one of true or false (optional)

- `directoryId`: the directory id where the file should be uploaded (optional)

- `isEphemeral`: whether the file and all related data should be deleted after the file is processed, must be one of true or false (optional, default: false)

- `pageCount`: page count of the file, used for early validation against page limits (optional). Not applicable to ZIP archives.

- `apiRequestId`: optional client-supplied correlation ID that groups files uploaded together into a logical batch. Persisted on every resulting document and forwarded as `api_request_id` on every subsequent webhook callback, so you can reconcile processing events back to the originating request. For ZIP uploads, every extracted file inherits the same `apiRequestId`. If omitted, the server auto-generates one (shared across all files extracted from the same ZIP).

The **presignedUploadURL** is valid for **3600 seconds** (1 hour) and can be used multiple times.

The input should contain the following information:

* **fileName**: the name of the file to be uploaded
* **fileType**: the type of the file to be uploaded
* **isSplit**: whether the file is a split file or not
* **callbackURL**: the url that will be called after the file is uploaded
* **ocrModel**: the ocr model to be used for the file processing
* **schemaLocking**: whether the schema should be locked after the file is uploaded, must be one of true or false
* **isEphemeral**: whether to automatically delete a file. If this is set to "true", then the file will be automatically deleted 24 hours after upload. (optional, default: false)

## How It Works

1. **Request Upload URL**: Submit file metadata to this endpoint
2. **Receive Presigned URL**: Get a secure upload URL valid for 1 hour
3. **Upload File**: Use the presigned URL to upload your file directly to storage
4. **Processing**: File is automatically processed with specified OCR model and schema
5. **Callback** (optional): Receive notification when processing is complete

## Request Parameters

### Request Body

The request body must contain a JSON object with the following properties:

| Property      | Type    | Required | Description                                                          |
| ------------- | ------- | -------- | -------------------------------------------------------------------- |
| fileName      | string  | Yes      | Original name of the file including extension (e.g., "document.pdf") |
| fileType      | string  | Yes      | MIME type of the file (e.g., "application/pdf", "image/jpeg")        |
| isSplit       | boolean | Yes      | Whether the file should be processed as separate pages/sections      |
| callbackURL   | string  | No       | HTTP endpoint to receive processing completion notifications         |
| ocrModel      | string  | No       | OCR engine to use for text extraction. Available models vary by plan |
| schemaLocking | boolean | Yes      | Whether to lock the schema after processing. Must be true or false   |
| isEphemeral   | boolean | No       | Whether to automatically delete a file 24 hours after upload.        |

### Responses

```json theme={null}
{
  "s3Path": "s3://bucket/upload/file.txt",
  "presignedUploadURL": "https://s3.amazonaws.com/bucket/upload/file.txt?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature=1%2F6%2BN7Z6h%2F7oV7Z6i%2F9oV7Z4%3D&Expires=3600",
  "uploadId": "f2538513-f0b9-4aa8-9c57-bc0a85c77de6",
  "callbackURL": "https://example.com/callback",
  "ocrModel": "Beethoven_ENG_G5.0",
  "schemaLocking": true,
  "isEphemeral": false
}
```


## OpenAPI

````yaml post /prod/v1/files/upload
openapi: 3.0.0
info:
  title: Public API
  description: >-

    ### Welcome to fileAI’s Public API Documentation.

    This API allows users to check the health of the system, upload and manage
    files, and manage AI Schemas.

    Should you have any questions, please reach out to fileAI via the “Contact a
    Developer” link below.


    [Contact a Developer](mailto:support@file.ai)


    ### Prerequisites


    Before using our API, please ensure you complete the following prerequisites

    - You must have a fileAI account. Sign up or login
    [here](https://orion.file.ai/en/sign-up)

    - You must have an API Key. After creating your fileAI account, you can
    generate your API Key. Refer to the Authentication section below for more
    details.


    ### Authentication

    All API requests require an API key for authentication.

    - To obtain your API key, please log in to your fileAI account and navigate
    to Project Settings in your dashboard

    - Keep your API key secure and do not share it publicly.


    ![Authentication](https://fileai-static-assets.s3.us-west-2.amazonaws.com/public-api-service/authentication.png)


    ### How to Use Your API Key

    Once you have your API key:

    - Click the Authorize button on the top-right of this page

    - Enter your API Key under Value

    - Click Authorize to start making authenticated requests directly from the
    documentation


    ![How to Use Your API
    Key](https://fileai-static-assets.s3.us-west-2.amazonaws.com/public-api-service/how-to-use-api-keys.png)
        
  version: '1.0'
  contact: {}
servers:
  - url: https://api.orion.file.ai
security: []
tags:
  - name: Public API V1
paths:
  /prod/v1/files/upload:
    post:
      tags:
        - Public API V1
      summary: Upload a file
      description: >-
        Upload a file to the server, this will return a presigned upload url to
        be used for the upload.


        The presignedUploadURL is valid for 300 seconds (5 minutes) and can be
        used multiple times.


        **ZIP archives are supported.** When `fileType` is `application/zip`,
        the server will extract the archive after upload and treat each
        contained file as an independent document (OCR, classification, and
        webhook callbacks are emitted per extracted file). All extracted files
        share the same `apiRequestId` so you can correlate their per-file
        callbacks back to the original ZIP upload. Poll `GET
        /v1/uploads/{uploadId}` to see per-file status (HTTP 207 is returned
        when some files in the ZIP succeeded and others failed).


        The input should contain the following information:


        - `fileName`: the name of the file to be uploaded (include the `.zip`
        extension when uploading a ZIP archive)


        - `fileType`: the MIME type of the file to be uploaded. Use
        `application/zip` for ZIP archives.


        - `isSplit`: whether the file is a split file or not (optional, default:
        false)


        - `isSplitExcel`: whether to split Excel files by worksheets (optional,
        default: false)


        - `callbackURL`: (optional) the URL that will be called after file
        processing. Must be a valid HTTPS URL.
          - If provided, this URL takes precedence over the API key's default callback URL
          - If not provided, the API key's default callback URL will be used (if configured)

        - `ocrModel`: the OCR model to be used for file processing (optional).
        Available models:
          - **English Models:**
            - `Beethoven_ENG_O5.6` - OpenAI v6
            - `Beethoven_ENG_G5.5` - Gemini v5
            - `Beethoven_ENG_GP25` - Gemini Pro 2.5
            - `Beethoven_ENG_GP25.1` - Gemini Pro 2.5 v1
            - `Beethoven_ENG_GP25.2` - Gemini Pro 2.5 PDF
            - `Beethoven_CUS_O5.1` - Custom OpenAI v8
            - `Beethoven_CUS_O5.2` - Custom Gemini v13
            - `Unified (google-document-ai-ocr-gemini-v10)` - Unified model
          - **Chinese Models:**
            - `Beethoven_ZH_O5.9` - Chinese OpenAI v9
          - **Japanese Models:**
            - `Beethoven_JP_O5.3` - Japanese OpenAI v3
            - `Beethoven_JP_G5.4` - Japanese Gemini fine-tuned
          - **Thai Models:**
            - `Beethoven_TH_O5.1` - Thai OpenAI v1
            - `Beethoven_TH_G5.1` - Thai Gemini v1

        - `schemaLocking`: whether the schema should be locked after the file is
        uploaded, must be one of true or false (optional)


        - `directoryId`: the directory id where the file should be uploaded
        (optional)


        - `isEphemeral`: whether the file and all related data should be deleted
        after the file is processed, must be one of true or false (optional,
        default: false)


        - `pageCount`: page count of the file, used for early validation against
        page limits (optional). Not applicable to ZIP archives.


        - `apiRequestId`: optional client-supplied correlation ID that groups
        files uploaded together into a logical batch. Persisted on every
        resulting document and forwarded as `api_request_id` on every subsequent
        webhook callback, so you can reconcile processing events back to the
        originating request. For ZIP uploads, every extracted file inherits the
        same `apiRequestId`. If omitted, the server auto-generates one (shared
        across all files extracted from the same ZIP).
      operationId: PublicAPIController_uploadFileRequest
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/UploadFileForPAInput'
      responses:
        '201':
          description: >-
            Get a presigned upload url for upload file, after getting the result
            use the presignedUploadURL with a PUT method to send the request
            with the binary file, 

            the presignedUploadURL is valid for 300 seconds (5 minutes) and can
            be used multiple times
          content:
            application/json:
              example:
                s3Path: s3://bucket/upload/file.txt
                presignedUploadURL: >-
                  https://s3.amazonaws.com/bucket/upload/file.txt?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature=1%2F6%2BN7Z6h%2F7oV7Z6i%2F9oV7Z4%3D&Expires=3600
                uploadId: f2538513-f0b9-4aa8-9c57-bc0a85c77de6
                callbackURL: https://example.com/callback
                ocrModel: Beethoven_ENG_O5.6
                schemaLocking: true
                isSplit: false
                isSplitExcel: false
                directoryId: 649e2d2d2d2d2d2d2d2d2d2d
              schema:
                type: object
                properties:
                  s3Path:
                    type: string
                  presignedUploadURL:
                    type: string
                  uploadId:
                    type: string
                  callbackURL:
                    type: string
                    description: The callback URL that will be used for notifications
                  ocrModel:
                    type: string
                  schemaLocking:
                    type: boolean
                  isSplit:
                    type: boolean
                  isSplitExcel:
                    type: boolean
                  directoryId:
                    type: string
        '401':
          description: Invalid API key | API key is missing
          content:
            application/json:
              example:
                message: Invalid API key | API key is missing
                error: Unauthorized
                statusCode: 401
              schema:
                type: object
                properties:
                  message:
                    type: string
                  error:
                    type: string
                  statusCode:
                    type: number
        '403':
          description: Access denied. You are in readonly mode.
          content:
            application/json:
              example:
                message: Access denied. You are in readonly mode.
                error: Forbidden
                statusCode: 403
              schema:
                type: object
                properties:
                  message:
                    type: string
                  error:
                    type: string
                  statusCode:
                    type: number
        '422':
          description: Invalid input
          content:
            application/json:
              example:
                message: >-
                  Invalid isSplit. It must be one of true or false.| Invalid
                  isSplitExcel. It must be one of true or false.| Invalid
                  fileType.| Invalid fileName.| Invalid ocrModel.| Invalid
                  schemaLocking.
                error: Unprocessable Entity
                statusCode: 422
              schema:
                type: object
                properties:
                  message:
                    type: string
                  error:
                    type: string
                  statusCode:
                    type: number
        '429':
          description: Too Many Requests
          content:
            application/json:
              example:
                statusCode: 429
                message: Too Many Requests
              schema:
                type: object
                properties:
                  statusCode:
                    type: number
                  message:
                    type: string
      security:
        - x-api-key: []
components:
  schemas:
    UploadFileForPAInput:
      type: object
      properties:
        fileName:
          type: string
          description: File name
          example: file.pdf
        fileType:
          type: string
          description: File type
          example: application/pdf
        isSplit:
          type: boolean
          description: Is split
          default: false
          example: false
        isSplitExcel:
          type: boolean
          description: Is split excel - whether to split Excel files by worksheets
          example: false
        callbackURL:
          type: string
          description: Callback URL
          example: https://example.com/callback
        ocrModel:
          type: string
          description: OCR model
          enum:
            - Beethoven_ENG_O5.6
            - Beethoven_ENG_G5.5
            - Beethoven_ENG_GP25
            - Beethoven_ENG_GP25.1
            - Beethoven_ENG_GP25.2
            - Beethoven_CUS_O5.1
            - Beethoven_CUS_O5.2
            - Beethoven_CUS_GP25.1
            - Unified (google-document-ai-ocr-gemini-v10)
            - Beethoven_ZH_O5.9
            - Beethoven_JP_O5.3
            - Beethoven_JP_G5.4
            - Beethoven_TH_O5.1
            - Beethoven_TH_G5.1
          example: Beethoven_ENG_O5.6
        schemaLocking:
          type: boolean
          description: Schema locking
          example: false
        directoryId:
          type: string
          description: Directory Id
          example: 649e2d2d2d2d2d2d2d2d2d2d
        isEphemeral:
          type: boolean
          description: Is ephemeral
          example: false
        pageCount:
          type: number
          description: >-
            Page count of the PDF file. Used for early validation against page
            limits.
          example: 50
        apiRequestId:
          type: string
          description: >-
            Optional client-supplied correlation ID that ties files uploaded in
            the same logical batch together.


            Behaviour:

            - Persisted on every resulting document and forwarded as
            `api_request_id` on all subsequent webhook callbacks for this
            upload, so you can correlate processing events back to the
            originating request on your side.

            - When the uploaded file is a **ZIP archive**, every file extracted
            from it inherits this same value, letting you link all per-file
            callbacks back to the one ZIP upload.

            - If omitted, the server auto-generates an opaque ID. For ZIP
            uploads specifically, the generated ID is shared across all
            extracted files.


            Supply your own value when you already have an idempotency key or
            job ID on the client side (e.g. from your own queue) that you want
            to reconcile against inbound webhook events.
          example: my-batch-request-123
      required:
        - fileName
        - fileType
  securitySchemes:
    x-api-key:
      type: apiKey
      in: header
      name: x-api-key
      description: API key for authentication

````