Skip to main content
Version: Next

Hidden Partitioning

Hidden partitioning separates the logical table schema from the physical partition layout. Users query real columns, while Iceberg derives partition values internally.

Traditional Partition Problem

In Hive-style layouts, users often need to know partition columns and write filters that match the physical layout.

For example, a table may store files using event_date, while the real data column is event_time. If users filter only on event_time, an older system may not prune partitions unless the query also filters event_date.

That creates three problems:

  • Queries become tied to physical layout.
  • Users must understand storage details.
  • Changing the partition scheme can break existing queries.

Iceberg Approach

Iceberg stores partition transforms in table metadata. For example, a table can be partitioned by day(event_time) without exposing an event_date column.

CREATE TABLE prod.db.events (
event_id BIGINT,
event_time TIMESTAMP,
level STRING,
message STRING
)
USING iceberg
PARTITIONED BY (days(event_time), level);

Users can write normal predicates:

SELECT *
FROM prod.db.events
WHERE event_time >= TIMESTAMP '2026-05-01 00:00:00'
AND event_time < TIMESTAMP '2026-05-02 00:00:00';

Iceberg projects the predicate onto the partition transform and prunes files where possible.

Common Transforms

  • identity(column): use the source value directly.
  • bucket(n, column): distribute values into buckets.
  • truncate(width, column): group values by truncated prefix or numeric range.
  • years(timestamp), months(timestamp), days(timestamp), hours(timestamp): partition timestamps by time grain.

Benefits

  • Queries stay stable when partition layout changes.
  • Producers do not need to populate derived partition columns.
  • Partition values are generated consistently.
  • Users can focus on data semantics instead of storage paths.

Design Tips

  • Partition for pruning, not for human browsing in object storage.
  • Avoid very high-cardinality identity partitions.
  • Prefer time transforms for event and ingestion timestamps.
  • Use buckets for high-cardinality identifiers when equality lookups are common.
  • Revisit the partition strategy as data volume and query patterns evolve.

Key Takeaway

Hidden partitioning is one of Iceberg's biggest usability improvements. It lets the table optimize physical layout while keeping SQL queries written against business columns.