Repository navigation

#

Apache Spark

Created by Matei Zaharia

发布于 May 26, 2014

apache/spark
spark.apache.org
维基百科

相关主题

Scala
apache-spark logo

Apache Spark 是一个开源分布式通用集群计算框架。

相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中,Spark使用了存储器内运算技术,能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。

Open source platform for the machine learning lifecycle

Python
20220
2 天前

酷玩 Spark: Spark 源代码解析、Spark 类库等

Scala
3481
3 年前

Interactive and Reactive Data Science using Scala and Spark.

JavaScript
3146
2 年前

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Go
2902
4 天前

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Jupyter Notebook
2674
24 天前

Apache Spark docker image

Shell
2054
2 年前
C#
2050
6 天前

A curated list of awesome Apache Spark packages and resources.

Shell
1786
6 个月前

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Java
1782
4 年前

The Internals of Apache Spark

1496
7 个月前

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

Jupyter Notebook
1421
3 年前

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Scala
1279
3 个月前
Python
1154
4 年前

(Deprecated) Scikit-learn integration package for Apache Spark

Python
1079
5 年前
graphframes/graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

Scala
1042
2 天前