Repository navigation

#

parquet

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++
16021
1 天前

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

Go
3855
2 年前
Rust
3347
3 个月前

Official Rust implementation of Apache Arrow

Rust
3156
13 小时前

Apache Parquet Java

Java
2951
5 天前
rilldata/rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

Go
2357
1 天前

Apache Parquet Format

Thrift
2055
9 天前

Apache Drill is a distributed MPP query layer for self describing data

Java
1991
18 天前

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Python
1859
19 天前

A large-scale entity and relation database supporting aggregation of properties

Java
1791
4 个月前

Open-source Snowflake and Fivetran alternative bundled together

Go
1471
11 天前

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

Rust
1446
9 个月前

Query anything (GitHub, Notion, +40 more) with SQL and let LLMs (ChatGPT, Claude) connect to using MCP

Go
1360
2 天前

Quilt is a data mesh for connecting people with actionable data

TypeScript
1347
1 天前
Rust
1184
4 天前

Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle

Go
1137
11 小时前

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Scala
1036
3 个月前