IngestThis
BLOG
COMMUNITY
PODCAST

Author: alex merced

2026-04-29 • Alex Merced

What Are Table Formats and Why Were They Needed?

Table formats like Apache Iceberg solved the ACID, schema, and performance problems that turned data lakes into data swa...

2026-04-29 • Alex Merced

The Metadata Structure of Modern Table Formats

Iceberg uses a metadata tree, Delta Lake uses a transaction log, Hudi uses a timeline. Here is exactly how each format o...

2026-04-29 • Alex Merced

Performance and Apache Iceberg's Metadata

Iceberg's three-layer metadata tree eliminates directory listing and enables multi-level data skipping. Here is how scan...

2026-04-29 • Alex Merced

Partition Evolution: Change Your Partitioning Without Rewriting Data

Iceberg lets you change partition schemes without rewriting data. Here is how partition evolution works internally and w...

2026-04-29 • Alex Merced

Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans

Iceberg's hidden partitioning separates physical layout from user queries using transform functions. Here is how it work...

2026-04-29 • Alex Merced

Writing to an Apache Iceberg Table: How Commits and ACID Actually Work

Here is exactly how an engine writes to an Iceberg table, step by step, from data files through the atomic commit that m...

2026-04-29 • Alex Merced

What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg

Lakehouse catalogs store metadata pointers, manage namespaces, and enforce access control. Here is the complete catalog ...

2026-04-29 • Alex Merced

When Catalogs Are Embedded in Storage

S3 Tables and MinIO AI Stor embed the Iceberg catalog directly in the storage layer. Here is when embedded catalogs make...

2026-04-29 • Alex Merced

How Data Lake Table Storage Degrades Over Time

Iceberg tables degrade through small files, orphan files, metadata bloat, sort order decay, and partition skew. Here is ...

2026-04-29 • Alex Merced

Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup

Keep Iceberg tables fast with compaction, snapshot expiry, orphan cleanup, and manifest rewriting. Here is when and how ...

2026-04-29 • Alex Merced

Apache Iceberg Metadata Tables: Querying the Internals

Iceberg metadata tables let you query snapshots, files, manifests, and partitions using SQL. Here is every metadata tabl...

2026-04-29 • Alex Merced

Using Apache Iceberg with Python and MPP Query Engines

Access Iceberg tables from Python with PyIceberg, DuckDB, and Polars, or through MPP engines like Dremio, Spark, and Tri...

2026-04-29 • Alex Merced

Approaches to Streaming Data into Apache Iceberg Tables

Stream data into Iceberg with Spark Structured Streaming, Flink, or Kafka Connect. Here is how each works and the trade-...

2026-04-29 • Alex Merced

Hands-On with Apache Iceberg Using Dremio Cloud

A practical walkthrough of creating, querying, and optimizing Iceberg tables on Dremio Cloud, from account setup to AI-p...

2026-04-29 • Alex Merced

Migrating to Apache Iceberg: Strategies for Every Source System

Migrate to Iceberg from Hive, data warehouses, or raw files using in-place migration, full rewrite, or the zero-downtime...

2026-04-29 • Alex Merced

How Query Engines Think: The Tradeoffs Behind Every Data System

Every database is a collection of engineering tradeoffs. Learn the 9 design decisions that shape how query engines store...

2026-04-29 • Alex Merced

Row vs. Column: How Storage Layout Shapes Everything

Row stores keep records together for fast transactions. Column stores keep field values together for fast analytics. Her...

2026-04-29 • Alex Merced

How Databases Organize Data on Disk: Pages, Blocks, and File Formats

Databases structure data on disk as heap files, sorted files, or LSM trees, then wrap it in formats like Parquet with me...

2026-04-29 • Alex Merced

B-Trees, LSM Trees, and the Indexing Tradeoff Spectrum

B-trees balance reads and writes for OLTP. LSM trees maximize write throughput. Bitmap indexes accelerate OLAP filtering...

2026-04-29 • Alex Merced

Inside the Query Optimizer: How Engines Pick a Plan

Query optimizers transform SQL into execution plans using rule-based rewrites, cost-based search, and adaptive runtime a...

Categories

data engineering
oltp
database
data
frontend
data lakehouse
Data Engineering
Data Lakehouse
Javascript
Data Architecture
Data Analytics
Devops
Data Modeling
DevOps
python
sql
rust
AI
Apache Iceberg
Software Development
Semantic Layer
TopicsData EngineeringApache IcebergData LakehouseAI & Machine Learning
SiteAll ArticlesRSS FeedSitemap
AuthorAlex MercedLinkedInTwitter / X

© 2026 Alex Merced — alexmercedcoder.dev