Repository navigation

#

delta-lake

Groovy
14115
9 小时前

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java
11749
9 小时前

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java
10505
6 小时前

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala
8218
5 小时前
Rust
3339
2 个月前

A native Rust library for Delta Lake, with bindings into Python

Rust
2904
2 天前
Mooncake-Labs/pg_mooncake
Rust
1622
3 小时前

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala
1335
7 个月前

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java
1092
3 天前

An open protocol for secure data sharing

Scala
861
5 天前

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python
639
16 小时前

Analytical database for data-driven Web applications 🪶

Rust
493
6 个月前

Apache Kafka® compatible broker with S3, PostgreSQL, Apache Iceberg and Delta Lake

Rust
436
1 小时前

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Python
259
23 天前

Sample project to demonstrate data engineering best practices

Python
195
1 年前

This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.

Python
103
3 年前