<?xml version="1.0" ?>
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>IngestThis</title>
      <link>https://ingestthis.com</link>
      <description>Articles, tutorials, and resources for Data Engineers, Scientists, Analysts, and Architects.</description>
      <language>en</language>
      <lastBuildDate>Tue, 19 May 2026 17:10:01 GMT</lastBuildDate>
      <atom:link href="https://ingestthis.com/feed.xml" rel="self" type="application/rss+xml" />
      
    <item>
      <title>What Are Table Formats and Why Were They Needed?</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-01-table-formats</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-01-table-formats</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Table formats like Apache Iceberg solved the ACID, schema, and performance problems that turned data lakes into data swamps. Here is how each one works.]]></description>
    </item>
    <item>
      <title>The Metadata Structure of Modern Table Formats</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-02-metadata-structures</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-02-metadata-structures</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg uses a metadata tree, Delta Lake uses a transaction log, Hudi uses a timeline. Here is exactly how each format organizes metadata and why it matters.]]></description>
    </item>
    <item>
      <title>Performance and Apache Iceberg's Metadata</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-03-iceberg-metadata-performance</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-03-iceberg-metadata-performance</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg's three-layer metadata tree eliminates directory listing and enables multi-level data skipping. Here is how scan planning actually works.]]></description>
    </item>
    <item>
      <title>Partition Evolution: Change Your Partitioning Without Rewriting Data</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-04-partition-evolution</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-04-partition-evolution</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg lets you change partition schemes without rewriting data. Here is how partition evolution works internally and why Hive-style partitioning could not do this.]]></description>
    </item>
    <item>
      <title>Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-05-hidden-partitioning</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-05-hidden-partitioning</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg's hidden partitioning separates physical layout from user queries using transform functions. Here is how it works and why it eliminates accidental full scans.]]></description>
    </item>
    <item>
      <title>Writing to an Apache Iceberg Table: How Commits and ACID Actually Work</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-06-writing-to-iceberg</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-06-writing-to-iceberg</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Here is exactly how an engine writes to an Iceberg table, step by step, from data files through the atomic commit that makes ACID guarantees possible.]]></description>
    </item>
    <item>
      <title>What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-07-lakehouse-catalogs</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-07-lakehouse-catalogs</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Lakehouse catalogs store metadata pointers, manage namespaces, and enforce access control. Here is the complete catalog landscape from Polaris to Glue.]]></description>
    </item>
    <item>
      <title>When Catalogs Are Embedded in Storage</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-08-embedded-catalogs</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-08-embedded-catalogs</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[S3 Tables and MinIO AI Stor embed the Iceberg catalog directly in the storage layer. Here is when embedded catalogs make sense and when they do not.]]></description>
    </item>
    <item>
      <title>How Data Lake Table Storage Degrades Over Time</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-09-storage-degradation</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-09-storage-degradation</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg tables degrade through small files, orphan files, metadata bloat, sort order decay, and partition skew. Here is how to diagnose each problem.]]></description>
    </item>
    <item>
      <title>Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-10-maintaining-iceberg</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-10-maintaining-iceberg</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Keep Iceberg tables fast with compaction, snapshot expiry, orphan cleanup, and manifest rewriting. Here is when and how to run each operation.]]></description>
    </item>
    <item>
      <title>Apache Iceberg Metadata Tables: Querying the Internals</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-11-metadata-tables</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-11-metadata-tables</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Iceberg metadata tables let you query snapshots, files, manifests, and partitions using SQL. Here is every metadata table and how to use them.]]></description>
    </item>
    <item>
      <title>Using Apache Iceberg with Python and MPP Query Engines</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-12-python-and-mpp</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-12-python-and-mpp</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Access Iceberg tables from Python with PyIceberg, DuckDB, and Polars, or through MPP engines like Dremio, Spark, and Trino. Here is how each approach works.]]></description>
    </item>
    <item>
      <title>Approaches to Streaming Data into Apache Iceberg Tables</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-13-streaming-to-iceberg</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-13-streaming-to-iceberg</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Stream data into Iceberg with Spark Structured Streaming, Flink, or Kafka Connect. Here is how each works and the trade-offs between latency and maintenance.]]></description>
    </item>
    <item>
      <title>Hands-On with Apache Iceberg Using Dremio Cloud</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-14-hands-on-dremio-cloud</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-14-hands-on-dremio-cloud</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[A practical walkthrough of creating, querying, and optimizing Iceberg tables on Dremio Cloud, from account setup to AI-powered analytics.]]></description>
    </item>
    <item>
      <title>Migrating to Apache Iceberg: Strategies for Every Source System</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-15-migrating-to-iceberg</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-apache-iceberg-masterclass-15-migrating-to-iceberg</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Migrate to Iceberg from Hive, data warehouses, or raw files using in-place migration, full rewrite, or the zero-downtime view swap pattern.]]></description>
    </item>
    <item>
      <title>How Query Engines Think: The Tradeoffs Behind Every Data System</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-01-overview</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-01-overview</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Every database is a collection of engineering tradeoffs. Learn the 9 design decisions that shape how query engines store, index, and process your data.]]></description>
    </item>
    <item>
      <title>Row vs. Column: How Storage Layout Shapes Everything</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-02-row-vs-column-storage</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-02-row-vs-column-storage</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Row stores keep records together for fast transactions. Column stores keep field values together for fast analytics. Here is how each layout works and when to use it.]]></description>
    </item>
    <item>
      <title>How Databases Organize Data on Disk: Pages, Blocks, and File Formats</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-03-data-organization-on-disk</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-03-data-organization-on-disk</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Databases structure data on disk as heap files, sorted files, or LSM trees, then wrap it in formats like Parquet with metadata that lets engines skip irrelevant blocks.]]></description>
    </item>
    <item>
      <title>B-Trees, LSM Trees, and the Indexing Tradeoff Spectrum</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-04-indexing-strategies</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-04-indexing-strategies</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[B-trees balance reads and writes for OLTP. LSM trees maximize write throughput. Bitmap indexes accelerate OLAP filtering. Here is when to use each.]]></description>
    </item>
    <item>
      <title>Inside the Query Optimizer: How Engines Pick a Plan</title>
      <link>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-05-query-optimizer</link>
      <guid>https://ingestthis.com/posts/2026/2026-04-29-query-engine-optimization-05-query-optimizer</guid>
      <pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate>
      <description><![CDATA[Query optimizers transform SQL into execution plans using rule-based rewrites, cost-based search, and adaptive runtime adjustments. Here is how each approach works.]]></description>
    </item>
    </channel>
  </rss>