Overview
TheGET /files endpoint supports incremental data loading through timestamp-based filters. This enables efficient Change Data Capture (CDC) workflows for syncing to Snowflake, BigQuery, or other data warehouses.
Instead of fetching all files on every sync, use
updatedAfter or
createdAfter to retrieve only what’s changed since your last sync.Timestamp Filters
updatedAfter
Filters files updated on or after the specified timestamp.
GET /files?updatedAfter=2025-01-01T00:00:00.000ZcreatedAfter
Filters files created on or after the specified timestamp.
GET /files?createdAfter=2025-01-01T00:00:00.000ZTimestamp Contract
| Property | Value |
|---|---|
| Format | ISO 8601 |
| Timezone | UTC |
| Precision | Milliseconds |
| Example | 2025-01-01T00:00:00.000Z |
All timestamps in the response (
createdAt, updatedAt) are returned in UTC
with millisecond precision.Watermark Semantics
Inclusive Comparison (>=)
The updatedAfter and createdAfter filters use inclusive comparison. This means:
- A file with
updatedAt = 2025-01-01T12:00:00.000Zwill be returned when querying withupdatedAfter=2025-01-01T12:00:00.000Z
Deduplication Requirement
Example deduplication in Snowflake:Ordering & Pagination
Default Sort Order
| Parameter | Default Value |
|---|---|
sortBy | createdAt |
sortOrder | ASC |
Recommended Sort for CDC
Pagination Parameters
The API uses offset-based pagination withpage and limit parameters.
| Parameter | Default | Maximum |
|---|---|---|
page | 1 | — |
limit | 100 | 100 |
Pagination Stability Warning
Mitigation strategies:- Use smaller time windows for
updatedAfter - Always deduplicate by
fileId + updatedAt - Re-sync periodically with a larger time window to catch missed records
What Triggers updatedAt?
The updatedAt timestamp changes when any of these events occur:
| Event | Updates updatedAt |
|---|---|
| File metadata changes | ✅ Yes |
| Schema/form field updates | ✅ Yes |
| Document status changes | ✅ Yes |
| Form filling/reprocessing | ✅ Yes |
| Tag modifications | ✅ Yes |
| Contact/vendor updates | ✅ Yes |
| Workflow step changes | ✅ Yes |
| Classification changes | ✅ Yes |
| OCR reprocessing | ✅ Yes |
Delete Handling
Tracking Deletions
If you need to detect deleted files:Option A: Periodic Full Sync
Do a full sync periodically and compare with your existing data to detect
missing records.
Best Practices
1. Store Your Watermark
1. Store Your Watermark
After each successful sync, store the maximum
updatedAt value from the batch:2. Use Recommended Query Parameters
2. Use Recommended Query Parameters
Always include sorting parameters for reliable CDC:
3. Handle Pagination Completely
3. Handle Pagination Completely
Paginate through all pages before updating your watermark:
4. Deduplicate Before Loading
4. Deduplicate Before Loading
Always deduplicate records by
fileId + updatedAt before inserting into your
data warehouse.5. Schedule Periodic Full Syncs
5. Schedule Periodic Full Syncs
Run full syncs (e.g., weekly) to:
- Catch any records missed due to pagination issues
- Detect deleted records by comparing with existing data
Snowflake Integration Example
- Initial Load
- Incremental Load
Response Schema
Quick Reference
| Feature | Behavior |
|---|---|
| Filter semantics | >= (inclusive) |
| Timestamp format | ISO 8601, UTC, milliseconds |
| Pagination | Offset-based (page, limit) |
| Default sort | createdAt ASC |
| Recommended sort for CDC | updatedAt ASC |
| Delete visibility | ❌ Not visible (insert/update only) |
| Deduplication required | ✅ Yes, by fileId + updatedAt |
Need Help?
Contact Support
Reach out to the fileAI engineering team for questions about CDC
implementation or to request new features.