Repository navigation
lakehouse
- Website
- Wikipedia
ClickHouse® is a real-time analytics database management system
The official home of the Presto distributed SQL query engine for big data
Apache Doris is an easy-to-use, high performance and unified analytics database.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
𝗔𝗜-𝗡𝗮𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲. Open-source Snowflake alternative. Proven at petabyte scale with enterprise performance. Built for multimodal analytics. https://databend.com
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
ByConity is an open source cloud data warehouse
YTsaurus is a scalable and fault-tolerant open-source big data platform.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Real-time analytics on Postgres tables
Apache Fluss is a streaming storage built for real-time analytics.
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
ClickBench: a Benchmark For Analytical Databases
DuckDB-powered data lake analytics from Postgres
A curated list of open source tools used in analytics platforms and data engineering ecosystem
The Control Plane for Apache Iceberg.
GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐
Use SQL to build ELT pipelines on a data lakehouse.
Examples of using Terraform to deploy Databricks resources