Repository navigation
parquet
- Website
- Wikipedia
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Blazing-fast Data-Wrangling toolkit
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
A large-scale entity and relation database supporting aggregation of properties
Real-time analytics on Postgres tables
Open-source Snowflake and Fivetran alternative bundled together
Quilt is a data mesh for connecting people with actionable data
A portable embedded database using Arrow.
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
Simple Windows desktop application for viewing & querying Apache Parquet files