Repository navigation
Apache Spark
Created by Matei Zaharia
发布于 May 26, 2014
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- 维基百科
相关主题
Scala
Apache Spark 是一个开源分布式通用集群计算框架。
相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中,Spark使用了存储器内运算技术,能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。
Open source platform for the machine learning lifecycle
Simple and Distributed Machine Learning
lakeFS - Data version control for your data lake | Git for data
Interactive and Reactive Data Science using Scala and Spark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Apache Spark docker image
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Feathr – A scalable, unified data and AI engineering platform for enterprise
A curated list of awesome Apache Spark packages and resources.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
The Internals of Apache Spark
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
PySpark + Scikit-learn = Sparkit-learn
(Deprecated) Scikit-learn integration package for Apache Spark
MapReduce, Spark, Java, and Scala for Data Algorithms Book
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs