Alex Merced || 2024-01-19
Data Lakehouse || Data Lakehouse - Data Lake - Apache Iceberg
The concept of the Open Lakehouse has emerged as a beacon of flexibility and innovation. An Open Lakehouse represents a specialized form data lakehouse (bringing data warehouse like functionality/performance to data on a data lake), uniquely characterized by its commitment to open standards and technologies. At the core of this paradigm are tools like Apache Iceberg, Nessie, and Apache Arrow, which collectively empower organizations to build highly efficient, scalable, and interoperable data ecosystems.
Unlike conventional data lakehouses which may have high levels of coupling between the storage formats, governance, optimization and more of their data with one vendor with few alternatives, an Open Lakehouse prioritizes the avoidance of vendor lock-in, ensuring that organizations maintain full control over their data infrastructure. This approach not only fosters a more adaptable and resilient data environment but also encourages a collaborative, community-driven development ethos that is instrumental in driving the field forward.
A key platform enabling open lakehouses is Dremio, a cutting-edge lakehouse platform that epitomizes the Open Lakehouse philosophy. Dremio seamlessly integrates various data sources, leveraging the power of open-source technologies to unify data management and analytics. This integration allows for an unprecedented level of flexibility and efficiency, making Dremio an indispensable tool for organizations looking to harness the full potential of their data. Dremio enables the maximization of decentralization in data harnessing the right features for data virtualization (decentralized data), data lakehouse (decentralized access to a single copy of a dataset) and data mesh (decentralized data curation).
This directory serves as a comprehensive resource for anyone looking to dive into the world of Open Lakehouse Engineering. Whether you’re a seasoned data professional or just starting out, the following resources will guide you through the intricacies of building and managing an Open Lakehouse, ensuring you’re well-equipped to leverage these exciting technologies to their fullest extent. Feel free to modify or expand upon this introduction to better fit the tone and scope of
If you are new to the data space I recommend starting with this playlist that will cover lakehouse engineering, modeling, big data concepts and more
Getting Started with Open Lakehouses
- No Code Setup of a Data Lakehouse on your Laptop with Dremio & Minio using Docker Desktop
- Video Playlist: Apache Iceberg Lakehouse Engineering
- Blog: Creating an Iceberg Lakehouse on your Laptop with Dremio/Minio/Nessie
- Blog: Apache Iceberg 101 - Comprehensive List of Resources
- Blog: BI Dashboard Acceleration: Cubes, Extracts, and Dremio’s Reflections
- Blog: 5 Use Cases for the Dremio Lakehouse
Hands-on Articles
- Blog: Creating an Iceberg Lakehouse with Spark, Minio, Dremio, Nessie
- Blog: Using dbt to Manage Your Dremio Semantic Layer
- Blog: Connecting to Dremio Using Apache Arrow Flight in Python
- Blog: Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi
- Blog: How to Create a Lakehouse with Airbyte, S3, Apache Iceberg, and Dremio
- Blog: Using Flink with Apache Iceberg and Nessie
- Blog: 3 Ways to Use Python with Apache Iceberg
- Blog: Using DuckDB with Your Dremio Data Lakehouse
- Blog: 3 Ways to Convert a Delta Lake Table Into an Apache Iceberg Table
- Blog: Getting Started with Project Nessie, Apache Iceberg, and Apache Spark Using Docker
- Video: Apache Superset & Dremio: How to Run Superset from Docker and Connect to Dremio Cloud
Conceptual Content
- Blog: Virtual Data Marts 101 - The Benefits and How-To
- Docs: Data Lakehouse Terms and Concepts
- Blog: The Who, What, and Why of Data Products
- Blog: Why Use Dremio to Implement a Data Mesh?
- Blog: Overcoming Data Silos - How Dremio Unifies Disparate Data Sources for Seamless Analytics
- Video: Where Data Lakehouse and DataOps/Data-as-Code Converge (Project Nessie & Dremio Arctic)
- Video: From Data Lake to Data Lakehouse (What, Why and How of Apache Iceberg/Dremio/Nessie Lakehouses)