Repository navigation
lakehouse
- Website
- Wikipedia
ClickHouse® is a real-time analytics database management system
The official home of the Presto distributed SQL query engine for big data
Apache Doris is an easy-to-use, high performance and unified analytics database.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
ByConity is an open source cloud data warehouse
YTsaurus is a scalable and fault-tolerant open-source big data platform.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Postgres Data Warehouse, built on Iceberg
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL
DuckDB-powered data lake analytics from Postgres
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Use SQL to build ELT pipelines on a data lakehouse.
Examples of using Terraform to deploy Databricks resources
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
The open-source, AI-native data stack