Repository navigation

lakehouse

Website
Wikipedia

ClickHouse® is a real-time analytics database management system

dbms olap analytics SQL big-data mpp clickhouse Hacktoberfest C++Rust 人工智能 cloud-native 数据库 distributed embedded lakehouse 自托管

C++

43170

7681

34 分钟前

prestodb / presto

The official home of the Presto distributed SQL query engine for big data

Java presto hive hadoop big-data SQL data lakehouse Query (disambiguation)

Java

16522

5498

21 小时前

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

olap 数据库 hudi iceberg real-time SQL BigQuery dbt delta-lake elt lakehouse query-engine redshift snowflake Apache Spark agent 人工智能 paimon

Java

14371

3569

5 小时前

StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java

10727

2151

18 小时前

databendlabs / databend

𝗔𝗜-𝗡𝗮𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲. Open-source Snowflake alternative. Proven at petabyte scale with enterprise performance. Built for multimodal analytics. https://databend.com

Rust 数据库 Serverless bigdata snowflake 人工智能 lakehouse olap SQL vector-database

Rust

8900

824

16 小时前

lakesoul-io / LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

lakesoul datalake lakehouse Apache Spark flink streaming big-data PostgreSQL Rust SQL huggingface Python PyTorch arrow datafusion vectorized velox

Java

3016

411

5 天前

ByConity / ByConity

ByConity is an open source cloud data warehouse

clickhouse cloud kubernets lakehouse olap s3 snowflake SQL clickhouse-database TikTok bytedance

C++

2215

318

6 个月前

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

big-data clickhouse distributed-database lakehouse olap-database Apache Spark SQL ytsaurus

C++

2082

174

2 小时前

apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

datalake lakehouse metadata federated-query stratosphere metalake skycomputing data-catalog ai-catalog model-catalog opendatacatalog

Java

2051

624

1 天前

Mooncake-Labs / pg_mooncake

Real-time analytics on Postgres tables

analytics columnstore delta-lake iceberg lakehouse parquet PostgreSQL

Rust

1727

1 天前

apache / fluss

Apache Fluss is a streaming storage built for real-time analytics.

streaming fluss lakehouse real-time-analytics big-data Hacktoberfest

Java

1484

414

4 天前

datazip-inc / olake

Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle

cdc change-data-capture data-pipeline 数据库 elt lakehouse replication apache-iceberg parquet s3 Hacktoberfest

1137

122

8 小时前

lakekeeper / lakekeeper

Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.

catalog data-lake iceberg lakehouse Rust

Rust

922

3 小时前

ClickHouse / ClickBench

ClickBench: a Benchmark For Analytical Databases

analytics benchmark big-data 数据库 olap SQL Amazon Web Services BigQuery chdb clickhouse datafusion datalake duckdb iceberg lakehouse parquet Rust snowflake doris

HTML

884

229

10 小时前

paradedb / pg_analytics

DuckDB-powered data lake analytics from Postgres

analytics arrow columnar datafusion lakehouse parquet PostgreSQL duckdb olap big-data 数据库 datalake deltalake iceberg object-storage SQL lakehouse-platform

Rust

523

7 个月前

pracdata / awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

Awesome Lists data-analytics data-engineering data-platform 数据库自托管 mlops data data-integration datalake lakehouse workflow-engine analytics data-warehouse observability data-pipeline etl

376

7 个月前

nimtable / nimtable

The Control Plane for Apache Iceberg.

apache-iceberg datalake lakehouse iceberg polaris

TypeScript

359

11 天前

gigapi / gigapi

GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

API duckdb Go olap parquet s3 数据库 REST API SQL clickhouse-server datalake query-engine data-lake lakehouse

346

13 天前