Repository navigation

#

hadoop-mapreduce

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

Java
255
1 年前

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

Java
140
6 个月前

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

Jupyter Notebook
74
4 个月前

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Python
54
2 年前

K-Means algorithm implementation with Hadoop and Spark for the course of Cloud Computing of the MSc AIDE at the University of Pisa.

Java
47
5 年前

A collection of mapreduce problems and solutions

Java
35
8 年前

This contain how to install Hadoop on google colab and how to run map-reduce in Hadoop

Jupyter Notebook
33
5 年前

Projects done in the Cloud Computing course.

Java
25
7 年前

Hadoop MapReduce word counting with Java

Java
24
5 年前

中文文本挖掘|舆情分析|Hadoop|Java|MapReduce

HTML
23
7 年前

2021 Spring (Distributed Computing Systems) 分布式系统与编程

Java
15
4 年前

Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton

JavaScript
15
3 年前

This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation and it's open-sourced for reference.

Java
13
4 年前