Repository navigation
mapreduce
- Website
- Wikipedia
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Enterprise job scheduling middleware with distributed computing ability.
Python clone of Spark, a MapReduce alike framework in Python
MapReduce, Spark, Java, and Scala for Data Algorithms Book
C# and F# language binding and extensions to Apache Spark
distributed_computing include mapreduce kvstore etc.
An open source framework for building data analytic applications.
🐎 A serverless MapReduce framework written for AWS Lambda
Uniffle is a high performance, general purpose Remote Shuffle Service.
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Compass is a task diagnosis platform for bigdata
Dynamic execution framework for your Redis data
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.