Repository navigation

#

mapreduce

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python
28573
2 年前
PowerJob/PowerJob

Enterprise job scheduling middleware with distributed computing ability.

Java
7596
22 天前

Python clone of Spark, a MapReduce alike framework in Python

Python
2680
5 年前

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

Shell
1679
1 个月前

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

1639
4 年前

C# and F# language binding and extensions to Apache Spark

C#
940
2 年前

distributed_computing include mapreduce kvstore etc.

Go
844
5 年前

An open source framework for building data analytic applications.

Java
784
2 天前

🐎 A serverless MapReduce framework written for AWS Lambda

Go
694
4 年前

A serverless cluster computing system for the Go programming language

Go
556
2 年前

Uniffle is a high performance, general purpose Remote Shuffle Service.

Java
430
5 天前
Java
401
10 个月前

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Python
401
2 年前

Dynamic execution framework for your Redis data

Rust
380
1 个月前

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Java
352
6 个月前

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

Python
337
1 个月前

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java
283
7 年前