Apache Iceberg Features
Apache Iceberg is a table format for large analytic datasets. Its main features come from keeping table state in metadata instead of relying on directory layout or engine-specific conventions.
Feature Map
- Schema evolution: change columns without rewriting existing data files.
- Hidden partitioning: let Iceberg derive partition values instead of exposing partition columns to users.
- Partition evolution: change partition strategy as the table grows.
- Sort order evolution: change write ordering independently from the logical table schema.
- Time travel: query historical snapshots by timestamp, snapshot ID, branch, or tag.
- Branching and tagging: create named snapshot references for audit, release, and experiment workflows.
- Row-level deletes: delete rows from immutable files using delete files.
- Concurrency and isolation: support concurrent readers and writers with snapshot isolation and optimistic commits.
Mental Model
Iceberg tables are built from immutable data files, metadata files, manifests, and snapshots. Each write creates a new snapshot. Readers use a committed snapshot, while writers create new metadata and attempt to commit it atomically.
This design gives Iceberg three important properties:
- Readers do not need to scan directory listings to understand a table.
- Table changes can be tracked, audited, and rolled back through metadata.
- Multiple engines can work with the same table if they respect the Iceberg catalog and commit protocol.