Repository navigation

#

delta-lake

Java
14372
8 小时前

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java
11985
2 小时前

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java
10727
21 小时前

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala
8306
1 天前
Rust
3347
3 个月前

A native Rust library for Delta Lake, with bindings into Python

Rust
2965
2 天前

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala
1345
8 个月前

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java
1107
11 天前

An open protocol for secure data sharing

Scala
871
2 个月前

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python
643
3 天前

Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake

Rust
497
3 天前

Analytical database for data-driven Web applications 🪶

Rust
495
7 个月前

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Python
267
1 个月前

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

Java
228
8 天前

Sample project to demonstrate data engineering best practices

Python
198
2 年前