Schema Evolution
Schema evolution lets a table change over time without rewriting all existing data files. Iceberg tracks columns with stable field IDs, so it can distinguish a renamed column from a deleted-and-recreated column.
Treat schema changes like product releases. They can be safe at the table-format level and still break downstream jobs, dashboards, APIs, or contracts.
Supported Changes
Common schema changes include:
- Add a column.
- Rename a column.
- Drop a column.
- Reorder columns.
- Update comments.
- Promote compatible types.
- Add nested fields inside structs, lists, or map values.
Iceberg also supports schema evolution in nested structures. Map keys are more restricted because changing keys can alter lookup semantics.
Add Columns
Adding a column is the safest schema change. Existing files do not contain the new field, so old rows read the new column as NULL unless a default is defined by the engine and table format version.
ALTER TABLE prod.db.orders
ADD COLUMNS (
customer_tier STRING COMMENT 'Customer loyalty tier'
);
Best Practices
- Use clear, descriptive names. The column name should explain its purpose and unit, such as
order_status_codeinstead ofstatus. - Define sensible defaults. Use a default value when the engine and table version support it, especially when downstream consumers do not expect
NULL. - Audit downstream dependencies. Identify ETL jobs, views, BI reports, API consumers, schema checks, and data contracts before rollout.
- Communicate schema changes. Announce additions in the data catalog, schema registry, release notes, or team channels.
- Keep changes additive. Do not repurpose or drop existing columns in the same change. Additive evolution is the least disruptive path.
Examples
Add a column with a comment:
ALTER TABLE prod.db.orders
ADD COLUMN customer_tier STRING
COMMENT 'Customer loyalty tier';
Define a default value when supported by the engine:
ALTER TABLE prod.db.orders
ALTER COLUMN customer_tier
SET DEFAULT 'standard';
Rename Columns
Renaming is a metadata change because Iceberg tracks the field ID, not only the name.
ALTER TABLE prod.db.orders
RENAME COLUMN order_status TO status;
Even though the table can handle the rename safely, consumers that reference the old name will fail. Coordinate the change or keep a compatibility view during migration.
Drop Columns
Dropping a column removes it from the current schema, but older snapshots can still expose the old schema during time travel.
ALTER TABLE prod.db.orders DROP COLUMN legacy_status;
Use this rollout pattern:
- Mark the column deprecated.
- Stop producing new values.
- Update readers.
- Validate no active dependency remains.
- Drop the column.
Best Practices
- Renaming is safe at the metadata level, but consumers must be updated. Queries, BI reports, exports, and code that reference the old name will break unless they are migrated or protected by a compatibility view.
- Dropping columns is operationally risky. Older snapshots can still expose the field, but current readers no longer see it. Archive or export important data before dropping the column.
- Use schema history for rollback planning. Iceberg tracks schema changes in metadata, but rollback is only useful when snapshots and files are still retained.
Reorder Columns
Column reorder is also a metadata change. It affects the logical display order, not the physical file layout.
ALTER TABLE prod.db.orders
ALTER COLUMN customer_tier AFTER customer_id;
Use this for readability, not for performance tuning.
Type Changes
Type promotion can be safe when it preserves all existing values. Examples include widening integer types or increasing decimal precision when supported by the engine.
Riskier changes, such as STRING to INT or BIGINT to INT, should use a new column:
- Add the new column with the desired type.
- Backfill from the old column.
- Move consumers.
- Drop the old column after validation.
Best Practices
- Compatible changes are safer. Moving from a narrower type to a wider type, such as
INTtoBIGINT, preserves existing values. - Incompatible changes can cause data loss. Narrowing a type, truncating strings, or parsing free-form text into numbers may fail or silently change values.
- Always test in staging before production. Validate the change against realistic data and representative queries.
- Prefer adding a new column for risky migrations. Backfill the new field and let consumers switch gradually.
- Use data rewrites when needed. If values need custom transformation, run an explicit ETL rewrite instead of relying on a simple type alteration.
Partition Field Changes
Partition changes are related to schema evolution because partition specs are stored in Iceberg metadata.
- Adding a new partition field creates a new partition spec. Existing data keeps the old spec, while new data uses the new one.
- Changing existing partition behavior may require a rewrite if you need old data to be physically laid out in the new way.
- Always test the performance impact before rolling out a new partition strategy.
For more detail, see Partition Evolution.
Operational Checklist
- Confirm the change is supported by the engine you use.
- Check whether Hive metastore or catalog integration has positional-schema limitations.
- Test time travel queries after the change.
- Validate downstream schema contracts.
- Prefer small, reversible changes over large schema migrations.
Key Takeaway
Iceberg schema evolution is safe because column identity is tracked in metadata. The table format protects old data, but teams still need a compatibility plan for readers and downstream systems.