Alex Merced || 2024-10-05

Data Lakehouse || data lakehouse - data engineering - apache iceberg

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you’re a beginner or an experienced data engineer, this guide will help you navigate the world of Apache Iceberg and its applications.

Apache Iceberg?

What is Apache Iceberg?

Apache Iceberg is open-source data lakehouse table format. That means it is a standard for how metadata defining a group of files as a table is stored. This metadata enables the files to be read and written to in the same way as a table in a data warehouses by any tool that supports the standard with the same features and ACID guarantees.

Why Does it Matter?

  • By operating off tables in a seperate storage layer, you can use all your favorite analytical tools on a single copy of your data.

  • Reducing the number of copies needed can reduce your compute costs, storage costs and network costs of your overall data platform.

  • By storing your data in a standard format, it reduces future migration costs when changing tooling or adopting new tools.

Who does Apache Iceberg benefit?

  • Data Engineers since it means less data movement so less data pipelines to manage.

  • Data Analysts since it means they can have more immediate access to data since it requires fewer data movements to make available especially when paired with data virtualization available in tools like Dremio which allows for Lakehouse Querying and Federated Querying (Virtualization) on one platform.

  • Data Scientists cause they can also have more immediate data access when training their AI/ML models.

  • Data Leaders since they can reduce their overall platform costs making it easier to fund other data initiatives.

Apache Iceberg Directory

Apache Iceberg Education

Here is a list of resources to help you learn Apache Iceberg:

Apache Iceberg Hands-on Tutorials

Here is a list of hands-on tutorials that will help you get started with Apache Iceberg:

Apache Iceberg’s Architecture

Here is a list of resources to help you learn Apache Iceberg’s architecture and internals:

Getting Data into Apache Iceberg

Here is a list of resources to help you get data into Apache Iceberg:

Apache Iceberg Migration

Here is a list of resources to help you migrate your data to Apache Iceberg:

Streaming with Apache Iceberg

Here is a list of resources to help you stream data into Apache Iceberg:

Partitioning with Apache Iceberg

Here is a list of resources to help you learn how to partition your data with Apache Iceberg:

Maintaining and Auditing Apache Iceberg Tables

Apache Iceberg Catalogs

Here is a list of resources to help you learn about Apache Iceberg Catalogs:

Querying Apache Iceberg Tables

Here is a list of resources to help you query your Apache Iceberg tables:

Hybrid Apache Iceberg Lakehouses

Here is a list of resources about implementing hybrid on-premises and cloud Apache Iceberg lakehouses:

Apache Iceberg and Other Formats

Here is a list of resources about Apache Iceberg and other formats (Apache Hudi, Apache Paimon, Delta Lake):

Python and Apache Iceberg

Here is a list of resources about Apache Iceberg and Python:

Governing Apache Iceberg Tables

Miscellaneous Apache Iceberg Resources

Here is a list of miscellaneous resources to help you learn Apache Iceberg: