Repository navigation

#

Apache Spark

Created by Matei Zaharia

发布于 May 26, 2014

apache/spark
spark.apache.org
维基百科

相关主题

Scala
spark logo

Apache Spark 是一个开源分布式通用集群计算框架。

相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中,Spark使用了存储器内运算技术,能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。

Apache Spark - A unified analytics engine for large-scale data processing

Scala
40957
44 分钟前

Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.

Jupyter Notebook
30101
17 小时前

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python
28111
1 年前

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Python
27223
3 天前

Learn and understand Docker&Container technologies, with real DevOps practice!

Go
25333
4 个月前

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

Python
15420
1 个月前

List of Data Science Cheatsheets to rule the world

15130
9 个月前
zhisheng17/flink-learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Java
14738
1 个月前

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python
14451
3 个月前

【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!

14428
1 年前

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

Java
13934
3 小时前
Java
13524
1 小时前

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

10055
2 年前

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala
7957
9 小时前

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Jupyter Notebook
7120
8 小时前
Java
6977
24 天前

A Flexible and Powerful Parameter Server for large-scale machine learning

Java
6748
1 年前