Sort Order Evolution
Sort order evolution controls how new data is written inside files. It is separate from partitioning and from the logical order of columns in the schema.
Why Sort Order Matters
Sorting can improve:
- File-level pruning with min/max statistics.
- Compression when nearby rows contain similar values.
- Merge and update workloads that repeatedly match on the same keys.
- Query latency for common filters.
Sorting is not a guarantee that query results are returned in that order. Queries still need ORDER BY when result order matters.
Set a Write Order
ALTER TABLE prod.db.orders
WRITE ORDERED BY customer_id, order_ts;
This asks writers to globally order rows when writing, depending on engine support and execution cost.
Set Local Order
ALTER TABLE prod.db.orders
WRITE LOCALLY ORDERED BY customer_id, order_ts;
Local ordering is usually cheaper because it orders rows within a task or writer instead of enforcing a full global order.
Distribute by Partition
ALTER TABLE prod.db.orders
WRITE DISTRIBUTED BY PARTITION
LOCALLY ORDERED BY customer_id, order_ts;
This pattern keeps writers aligned with partitions and improves row clustering within each partition.
When to Use It
Use sort order evolution when:
- Queries repeatedly filter by the same columns.
- Data files are large enough for min/max pruning to matter.
- Merge or delete workloads frequently target the same keys.
- Compression and read efficiency are more important than write simplicity.
Avoid aggressive sorting when:
- The table is small.
- Writes are latency-sensitive.
- Query patterns are not stable.
- Sorting creates expensive shuffles for little read benefit.
Key Takeaway
Sort order evolution tunes how new files are written. It improves read planning and compression, but it should be driven by measured query patterns rather than aesthetics.