Repository navigation
hdfs
- Website
- Wikipedia
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
Ceph is a distributed object, block, and file storage platform
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Utils for streaming large files (S3, HDFS, gzip, bz2...)
The Universal Storage Engine
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Real Time Analytics and Data Pipelines based on Spark Streaming
Deprecated - See Lenses.io Community Edition
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Big Data Ecosystem Docker
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Fundamentals of Spark with Python (using PySpark), code examples