Repository navigation

mapreduce

Website
Wikipedia

donnemartin / data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python 机器学习深度学习数据科学 big-data Amazon Web Services Tensorflow theano caffe scikit-learn kaggle Apache Spark mapreduce hadoop matplotlib pandas NumPy SciPy Keras

Python

28573

8019

2 年前

heibaiying / BigData-Notes

大数据入门指南 ⭐

hadoop hdfs Yarn mapreduce hive Apache Spark storm hbase Scala kafka zookeeper flume azkaban sqoop phoenix bigdata big-data

Java

16683

4306

2 年前

PowerJob / PowerJob

Enterprise job scheduling middleware with distributed computing ability.

scheduler workflow distributed mapreduce Java cron job job-scheduler

Java

7596

1337

22 天前

douban / dpark

Python clone of Spark, a MapReduce alike framework in Python

bigdata mapreduce dpark stream-processing Apache Spark Python

Python

2680

530

5 年前

collabH / bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

flink kafka hive mapreduce Apache Spark olap hadoop hbase debezium hdfs bigdata hudi

Shell

1679

385

1 个月前

water8394 / BigData-Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata Apache Spark kafka hbase flink hadoop hdfs mapreduce Yarn 面试

1639

447

4 年前

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

hadoop-mapreduce Java distributed-computing Scala mapreduce Python 机器学习 pyspark Apache Spark design-patterns

Java

1081

659

1 年前

microsoft / Mobius

C# and F# language binding and extensions to Apache Spark

Apache Spark dataframe dataset streaming C#spark-streaming F#bigdata mapreduce

940

208

2 年前

happyer / distributed-computing

distributed_computing include mapreduce kvstore etc.

raft mapreduce consistency

844

213

5 年前

cdapio / cdap

An open source framework for building data analytic applications.

unified integration platform dataset mapreduce Apache Spark spark-streaming Java cdap Python middleware

Java

784

349

2 天前

bcongdon / corral

🐎 A serverless MapReduce framework written for AWS Lambda

aws-lambda mapreduce Serverless

694

4 年前

sunnyandgood / BigData

💎🔥大数据学习笔记

hadoop hive hbase hdfs zookeeper sqoop mapreduce flume MySQL Linux Shell

Java

681

229

6 年前

grailbio / bigslice

A serverless cluster computing system for the Go programming language

cluster computing Go mapreduce bigdata 机器学习 etl

556

2 年前

apache / uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

mapreduce shuffle Apache Spark remote-shuffle-service RSS tez

Java

430

160

5 天前

cubefs / compass

Compass is a task diagnosis platform for bigdata

bigdata Apache Spark hadoop flink mapreduce scheduler SQL airflow dolphinscheduler

Java

401

148

10 个月前

CamDavidsonPilon / tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Python estimate pyspark distributed-computing mapreduce

Python

401

2 年前

RedisGears / RedisGears

Dynamic execution framework for your Redis data

Redis mapreduce stream-processing analytics

Rust

380

1 个月前

cwensel / cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

hadoop Java mapreduce tez

Java

352

220

6 个月前

datawhalechina / juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

bigdata hadoop hive hbase hdfs Apache Spark mapreduce

Python

337

1 个月前

DigitalPebble / behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

hadoop Java 自然语言处理 mapreduce

Java

283

7 年前