Repository navigation
parquet
- Website
- Wikipedia
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Blazing-fast Data-Wrangling toolkit
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
A large-scale entity and relation database supporting aggregation of properties
Single-binary Postgres read replica optimized for analytics
Quilt is a data mesh for connecting people with actionable data
Postgres Data Warehouse, built on Iceberg
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
A portable embedded database using Arrow.
Simple Windows desktop application for viewing & querying Apache Parquet files
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL