Apache Iceberg Dev Mailing List — Weekly Digest (Aug 17–23 2025) – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Alex Merced

The Apache Iceberg community was abuzz in mid‑August with discussions about schema evolution, REST API behaviour and improvements to delete files and error handling. Below is a summary of the week’s most important conversations, with links to the relevant threads in the Apache mail archives.

Deprecating position‑delete files with row data

Péter Váry raised a proposal to remove support for a rarely used type of position‑delete file that stores deleted row data. Iceberg’s spec defines two forms of position‑delete files: the commonly used file‑position type, and a more complicated variant that also stores the deleted row data. Péter argued that the second type is optional in the v3 spec and has no writers in the Java library. Deprecating it in Iceberg 2.0 would simplify the FileFormat API, avoid unnecessary complexity, and encourage engines to adopt deletion vectors instead. Russell Spitzer and Fokko Driesprong supported the removal, noting that PyIceberg and the Java library never write row‑data deletes. Renjie Liu and Amogh Jahagirdar suggested adding implementation notes acknowledging that some libraries still support row‑data deletes but agreeing to remove them in v2.0. Péter referenced an open pull request (#13870) that removes the feature in the Java implementation.

→ Thread: Deprecation of position deletes with row data

Making REST updateTable idempotent

Huaxin Gao proposed adding an optional Idempotency-Key header to REST catalog mutation endpoints, such as updateTable. Without idempotency, a client retrying a failed POST may hit a 409 conflict, leaving the table in a corrupted state. The proposed header ensures that only the first request with a given key is executed; subsequent requests with the same payload return the original result, while requests with the same key but different payload return a 422 error. Capability discovery fields (idempotency‑tokens‑respected and idempotency‑token‑lifetime) would allow clients to detect support for the feature. The proposal aligns with the IETF HTTP API guidelines and is backwards compatible: servers that ignore the header fall back to current behaviour. Feedback is requested on the semantics and token lifetime.

→ Thread: Iceberg REST catalog idempotency

Treating HTTP 503 as non‑retryable for `updateTable`

Prashant Singh started a vote to update the REST spec so that HTTP 503 responses during updateTable operations are treated as non‑retryable errors (similar to 500, 502, and 504). He noted that infrastructure such as Istio or Envoy proxies may return 503 after a commit succeeds but before the client receives a response. Retrying in this scenario can corrupt table metadata. The proposal modifies the spec to indicate that clients should not retry on 503, leaving table recovery to the user. Community members responded overwhelmingly with +1 votes; eight binding votes and four non‑binding votes were recorded. Prashant declared the vote passed and planned to merge the spec change.

→ Thread: Vote: mark 503 as non‑retryable error code

Understanding type promotion rules

Micah Kornfield raised questions about how type promotion works when writing Parquet files using Iceberg. He asked whether writers could produce new files with promoted types (e.g., int → long or float → double) while existing files use the original type, and how readers handle such mixed schemas. Russell Spitzer replied that while type promotion is allowed by the spec, writers should write files consistent with the current table schema and include all columns (even if they are null) to avoid missing data. Daniel Weeks added that Iceberg’s schema evolution rules deliberately allow old writers to continue writing with a stale schema; readers must handle both old and promoted types gracefully. Steven Wu pointed out that streaming jobs may run for hours or days with a stale schema, so mixed type files are inevitable.

→ Thread: What type promotion actually means

Adding a `loaded‑via` field to `loadTable` requests

Prashant Singh proposed adding a loaded‑via parameter to the loadTable request in the Iceberg REST catalog. The parameter would indicate the name of a definer view whose permissions should be used when loading a table. Today, loadTable checks the requesting user’s permissions, which prevents engines from resolving definer views that execute under the view owner’s context. The new field would be optional and backward compatible. Ryan Blue cautioned that authorisation should remain the responsibility of the catalog and that the spec should avoid mixing table ownership semantics with view resolution. Claude Warren warned that adding context fields could open new attack vectors; he advocated for designing tests that demonstrate the security model before finalising the feature. Prashant clarified that the field is only a context hint for audit logging and that authorisation remains unchanged.

→ Thread: Add loaded‑via to support definer views

Standardising error messages across languages

André Luis Anastácio initiated a discussion about how the Reference Catalog Kit (RCK) validates error messages. RCK currently checks both the exception type and the exact error message, which creates friction for non‑Java implementations (e.g., Rust, Python, Go) whose messages differ slightly. Daniel Weeks agreed that some standardisation is helpful, but emphasised that messages need to be clear and consistent rather than identical. Steve Loughran suggested introducing numeric error codes (similar to Windows error codes) to decouple tests from human‑readable strings. André pointed to a proposal document suggesting that each error consist of a code and a default message; tests could then assert only on the code. Ryan Blue cautioned that users rely on human‑readable errors and that error codes alone are insufficient; instead, the spec should require both the error type and a default message while leaving room for implementations to localise or augment the message.

→ Thread: RCK and Iceberg clients – should we standardise error messages?

Discussion on analytics accelerator as default S3 input stream

Michael Stubbs proposed making the Analytics Accelerator Library (AAL) the default input stream for Amazon S3 in S3FileIO. Preliminary testing showed significant performance improvements when reading Parquet files. Kevin Liu encouraged raising the idea at a community sync. Fuat Basik and other contributors expressed interest, and Michael scheduled a meeting for Aug 27 to discuss adoption details and gather feedback.

→ Thread: Analytics accelerator library as default S3 input stream

Upcoming features and proposals

Versioned SQL catalog routines: Yufei Gu reported outcomes from a community sync on versioned SQL user‑defined functions. The plan is to treat all overloads of a UDF as a single entity, assign a global version number to track changes, and return all overloads when loading a UDF. Filtering by overload is considered an advanced feature for later versions.
Improved column statistics for V4: Eduard Tudenhöfner scheduled a sync to discuss the proposal for improved column statistics. Contributors debated reserving field IDs for statistics, bounding stats values, and requiring writers to share a single stats namespace.
GitHub Action for markdown linting: Manu Zhang proposed adding a markdownlint GitHub Action to Iceberg’s Java repository. Eduard Tudenhöfner suggested using Spotless to lint markdown, while Fokko Driesprong supported adding automated checks to catch broken links and rendering issues.
Merge queue in Iceberg repositories: Renjie Liu suggested enabling GitHub’s merge queue to ensure that branches meet the latest commit state and all checks before merging. Jean‑Baptiste Onofré and Eduard Tudenhöfner supported the idea, noting it would add safety but could increase wait times for busy repositories.

Closing thoughts

The week of August 17–23 2025 showcased the Iceberg community’s commitment to robust APIs and ease of use. Discussions ranged from subtle schema‑evolution details (type promotion) and removal of unused features (row‑data position deletes) to practical improvements like idempotent REST calls and clearer error messages. The continued push towards V4 features—improved column statistics, simpler file‑format APIs and adoption of analytics accelerators—demonstrates that Iceberg is evolving to meet the needs of diverse runtimes and storage systems. If you’d like to participate, consider subscribing to the dev list and joining the upcoming syncs.