Repository navigation

hadoop-hdfs

Website
Wikipedia

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.

distributed-storage distributed-systems s3 hdfs fuse distributed-file-system hadoop-hdfs posix tiered-file-system Kubernetes replication object-storage s3-storage seaweedfs erasure-coding blob-storage cloud-drive

25541

2451

2 小时前

OBenner / data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

data-engineering 面试 hadoop hadoop-hdfs Apache Spark flink SQL kafka hive impala airflow Amazon Web Services Azure Apache Cassandra flume hbase avro nifi 数据结构

1393

487

7 个月前

Morphl-AI / MorphL-Community-Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

人工智能机器学习用户体验(UX)front-end-development pyspark Apache Cassandra Kubernetes hadoop-hdfs pipeline

Python

263

6 年前

linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

hadoop hadoop-filesystem hdfs Testing scale performance-testing performance-test performance-analysis performance-metrics hadoop-hdfs

Java

132

2 年前

AhmetFurkanDEMIR / Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

data data-engineer data-engineering data-engineering-pipeline Docker Docker Compose hadoop hadoop-filesystem hadoop-hdfs hdfs kafka kafka-consumer kafka-producer kafka-ui Python

Python

115

2 年前

groda / big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

big-data bigdata Apache Spark spark-sql Docker mapreduce pyspark hadoop Jupyter Notebook hadoop-hdfs hadoop-mapreduce

Jupyter Notebook

23 天前

IBM / sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

hbase Apache Spark SQL NoSQL hadoop-hdfs

3 年前

vim89 / datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Apache Spark spark-sql Python pyspark etl etl-pipeline etl-framework XML xml-parsing datalake big-data hadoop hadoop-mapreduce hadoop-hdfs data-pipeline

Python

2 年前