Skip to main content
Version: Next

Sort Order Evolution

Sort order evolution controls how new data is written inside files. It is separate from partitioning and from the logical order of columns in the schema.

Why Sort Order Matters

Sorting can improve:

  • File-level pruning with min/max statistics.
  • Compression when nearby rows contain similar values.
  • Merge and update workloads that repeatedly match on the same keys.
  • Query latency for common filters.

Sorting is not a guarantee that query results are returned in that order. Queries still need ORDER BY when result order matters.

Set a Write Order

ALTER TABLE prod.db.orders
WRITE ORDERED BY customer_id, order_ts;

This asks writers to globally order rows when writing, depending on engine support and execution cost.

Set Local Order

ALTER TABLE prod.db.orders
WRITE LOCALLY ORDERED BY customer_id, order_ts;

Local ordering is usually cheaper because it orders rows within a task or writer instead of enforcing a full global order.

Distribute by Partition

ALTER TABLE prod.db.orders
WRITE DISTRIBUTED BY PARTITION
LOCALLY ORDERED BY customer_id, order_ts;

This pattern keeps writers aligned with partitions and improves row clustering within each partition.

When to Use It

Use sort order evolution when:

  • Queries repeatedly filter by the same columns.
  • Data files are large enough for min/max pruning to matter.
  • Merge or delete workloads frequently target the same keys.
  • Compression and read efficiency are more important than write simplicity.

Avoid aggressive sorting when:

  • The table is small.
  • Writes are latency-sensitive.
  • Query patterns are not stable.
  • Sorting creates expensive shuffles for little read benefit.

Key Takeaway

Sort order evolution tunes how new files are written. It improves read planning and compression, but it should be driven by measured query patterns rather than aesthetics.