📄️ Apache Iceberg Features
Apache Iceberg is a table format for large analytic datasets. Its main features come from keeping table state in metadata instead of relying on directory layout or engine-specific conventions.
📄️ Schema Evolution
Schema evolution lets a table change over time without rewriting all existing data files. Iceberg tracks columns with stable field IDs, so it can distinguish a renamed column from a deleted-and-recreated column.
📄️ Hidden Partitioning
Hidden partitioning separates the logical table schema from the physical partition layout. Users query real columns, while Iceberg derives partition values internally.
📄️ Partition Evolution
Partition evolution lets you change a table's partition strategy without rewriting existing data files. Old files keep their original partition spec, and new files use the new spec.
📄️ Sort Order Evolution
Sort order evolution controls how new data is written inside files. It is separate from partitioning and from the logical order of columns in the schema.
📄️ Time Travel
Time travel lets you query an Iceberg table as it existed at a previous snapshot. This is useful for audits, reproducible reports, backfills, debugging, and incident recovery.
📄️ Branching and Tagging
Branches and tags are named references to snapshots. They make snapshot history easier to manage for audit, experiments, validation, and release workflows.
📄️ Row-Level Deletes
Row-level deletes let Iceberg remove rows from immutable data files without immediately rewriting the whole file. Delete information is stored separately and applied at read time.
📄️ Concurrency and Isolation
Iceberg is designed for concurrent readers and writers. Readers use committed snapshots, while writers create new metadata and commit changes atomically through a catalog.