Concurrency and Isolation
Iceberg is designed for concurrent readers and writers. Readers use committed snapshots, while writers create new metadata and commit changes atomically through a catalog.
Snapshot Isolation
Readers select a snapshot and see a consistent table state. They do not see partial writes because data files are only visible after the snapshot commit succeeds.
This means:
- Readers do not need to lock the table.
- A failed write does not expose partial output.
- A long-running query can continue using the snapshot it planned against.
Optimistic Concurrency
Iceberg writers use optimistic concurrency control. A writer prepares new metadata based on the table state it read, then attempts to commit it.
If another writer commits first, Iceberg validates whether the new commit still applies safely. If not, the writer must retry or fail.
Conflict Examples
Conflicts are more likely when two writes touch the same logical data:
- Two jobs overwrite overlapping partitions.
- A compaction job rewrites files while a merge job deletes rows from the same files.
- A backfill rewrites data while a streaming job appends to the same table.
Appends are usually easier to commit concurrently than overwrites or row-level updates.
Serializable Behavior
Iceberg's goal is serializable isolation for table operations: successful commits should behave as if they happened one at a time in a valid order.
The exact behavior depends on the engine, catalog, operation type, and isolation configuration.
Operational Practices
- Prefer append-only writes for streaming ingestion.
- Avoid running compaction and heavy merge jobs on the same partitions at the same time.
- Partition maintenance jobs by time range or data domain.
- Make writers retryable and idempotent.
- Monitor commit retries and validation failures.
- Use branches for risky backfills or experiments.
Key Takeaway
Iceberg concurrency relies on immutable files, snapshot metadata, and optimistic commits. It works well when write jobs are scoped clearly and maintenance workflows avoid overlapping hot data.