Version: Next

Apache Iceberg Features

Apache Iceberg is a table format for large analytic datasets. Its main features come from keeping table state in metadata instead of relying on directory layout or engine-specific conventions.

Feature Map

Schema evolution: change columns without rewriting existing data files.
Hidden partitioning: let Iceberg derive partition values instead of exposing partition columns to users.
Partition evolution: change partition strategy as the table grows.
Sort order evolution: change write ordering independently from the logical table schema.
Time travel: query historical snapshots by timestamp, snapshot ID, branch, or tag.
Branching and tagging: create named snapshot references for audit, release, and experiment workflows.
Row-level deletes: delete rows from immutable files using delete files.
Concurrency and isolation: support concurrent readers and writers with snapshot isolation and optimistic commits.

Mental Model

Iceberg tables are built from immutable data files, metadata files, manifests, and snapshots. Each write creates a new snapshot. Readers use a committed snapshot, while writers create new metadata and attempt to commit it atomically.

This design gives Iceberg three important properties:

Readers do not need to scan directory listings to understand a table.
Table changes can be tracked, audited, and rolled back through metadata.
Multiple engines can work with the same table if they respect the Iceberg catalog and commit protocol.

Feature Map​

Mental Model​

Feature Map

Mental Model