Repository navigation

#

datalake

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

Python
21841
7 天前

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java
11749
1 小时前

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java
10506
2 小时前

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Python
8780
13 天前
Java
5915
1 小时前
DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

Java
3523
13 小时前

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Java
2924
5 天前

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

Java
1772
13 小时前

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

553
2 个月前

Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake

Rust
436
4 小时前

Open Control Plane for Tables in Data Lakehouse

Java
366
7 天前

The Control Plane for Apache Iceberg.

TypeScript
334
12 天前

GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

Go
306
11 小时前