Repository navigation
hdfs
- Website
- Wikipedia
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Ceph is a distributed object, block, and file storage platform
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Utils for streaming large files (S3, HDFS, gzip, bz2...)
The Universal Storage Engine
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Real Time Analytics and Data Pipelines based on Spark Streaming
Web tool for Kafka Connect |
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Big Data Ecosystem Docker
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Fundamentals of Spark with Python (using PySpark), code examples