Row-Level Deletes
Row-level deletes let Iceberg remove rows from immutable data files without immediately rewriting the whole file. Delete information is stored separately and applied at read time.
Why Deletes Need a Table Format
Object storage and columnar files are usually immutable. Updating one row inside a Parquet file is not practical. A table format solves this by tracking data files and delete files in metadata.
Delete File Types
Iceberg has two common row-level delete mechanisms.
Position Deletes
A position delete identifies a row by:
- The data file path.
- The row position inside that file.
Position deletes are precise and are often produced by engines that already know which file and row position must be removed.
Equality Deletes
An equality delete identifies rows by column values, such as customer_id = 10.
Equality deletes are useful for change data capture and merge workloads, but they can be more expensive to apply because readers must compare row values.
SQL Deletes
DELETE FROM prod.db.orders
WHERE order_status = 'CANCELLED';
Engine support varies. In Spark, row-level delete, update, and merge operations require Iceberg SQL extensions.
Merge and Update
MERGE INTO is commonly used for CDC and upsert workflows.
MERGE INTO prod.db.customers AS target
USING staging.customers_cdc AS source
ON target.customer_id = source.customer_id
WHEN MATCHED AND source.op = 'D' THEN DELETE
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Depending on table properties and engine support, updates may be implemented with copy-on-write or merge-on-read behavior.
Read Impact
Deletes improve write efficiency, but they can slow reads if too many delete files accumulate.
Watch for:
- Many small delete files.
- Equality deletes applied to large partitions.
- Query latency increasing after frequent merges.
- Compaction jobs that rewrite data but do not clean related delete files.
Maintenance
Common maintenance actions include:
- Compact small data files.
- Rewrite data files to materialize deletes.
- Expire old snapshots after retention windows.
- Monitor delete-file counts by table and partition.
Key Takeaway
Row-level deletes make updates and CDC practical on immutable files. They are not free: high-delete workloads need compaction and snapshot maintenance to keep reads fast.