Repository navigation

#

mapreduce

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python
28111
1 年前
PowerJob/PowerJob

Enterprise job scheduling middleware with distributed computing ability.

Java
7433
3 个月前

Python clone of Spark, a MapReduce alike framework in Python

Python
2682
4 年前

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

1610
4 年前

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Shell
1586
5 个月前

C# and F# language binding and extensions to Apache Spark

C#
940
1 年前

distributed_computing include mapreduce kvstore etc.

Go
837
5 年前

An open source framework for building data analytic applications.

Java
769
2 天前

🐎 A serverless MapReduce framework written for AWS Lambda

Go
694
3 年前

A serverless cluster computing system for the Go programming language

Go
553
2 年前

Uniffle is a high performance, general purpose Remote Shuffle Service.

Java
415
1 天前

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Python
393
2 年前
Java
381
5 个月前

Dynamic execution framework for your Redis data

Rust
376
4 个月前

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Java
350
10 天前

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Python
312
2 年前

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java
281
7 年前