Repository navigation

Apache Spark

Created by Matei Zaharia

发布于 May 26, 2014

Repository: apache/spark
Website: spark.apache.org
Wikipedia: 维基百科

相关主题

Apache Spark 是一个开源分布式通用集群计算框架。

相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中，Spark使用了存储器内运算技术，能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Python Scala R Java big-data jdbc SQL Apache Spark

Scala

42014

28862

2 天前

DataTalksClub / data-engineering-zoomcamp

Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.

data-engineering kafka Apache Spark dbt Docker kestra

Jupyter Notebook

32966

7011

15 天前

donnemartin / data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python 机器学习深度学习数据科学 big-data Amazon Web Services Tensorflow theano caffe scikit-learn kaggle Apache Spark mapreduce hadoop matplotlib pandas NumPy SciPy Keras

Python

28574

8019

2 年前

getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

redash Python visualization analytics bi redshift BigQuery athena MySQL PostgreSQL dashboard JavaScript business-intelligence databricks Apache Spark spark-sql Hacktoberfest

Python

27833

4521

2 天前

yeasy / docker_practice

Learn and understand Docker&Container technologies, with real DevOps practice!

Docker book cloud-computing container Kubernetes swarm mesos Apache Spark DevOps Linux

25590

5771

9 个月前

heibaiying / BigData-Notes

大数据入门指南 ⭐

hadoop hdfs Yarn mapreduce hive Apache Spark storm hbase Scala kafka zookeeper flume azkaban sqoop phoenix bigdata big-data

Java

16683

4306

2 年前

FavioVazquez / ds-cheatsheets

List of Data Science Cheatsheets to rule the world

datascience Python R Apache Spark 编程 Jupyter Notebook cheatsheet

15728

4001

1 年前

GaiZhenbiao / ChuanhuChatGPT

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

聊天机器人 ChatGPT API chatglm claude ernie gemini gemma llama midjourney minimax moss ollama qwen Apache Spark stablelm

Python

15433

2271

2 个月前

zhisheng17 / flink-learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

flink kafka elasticsearch Apache Spark Redis MySQL rocketmq hbase rabbitmq stream-processing streaming clickhouse loki influxdb opentsdb

Java

14943

3955

7 个月前

aalansehaiyang / technology-talk

【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让你成为更牛的自己！

Java Spring Spring Boot dubbo kafka Git hbase mycat Apache Spark ECMAScript

14634

3810

2 个月前

horovod / horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Tensorflow uber 机器学习 mpi baidu 深度学习 Keras PyTorch mxnet Apache Spark ray

Python

14600

2262

11 天前

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

olap 数据库 hudi iceberg real-time SQL BigQuery dbt delta-lake elt lakehouse query-engine redshift snowflake Apache Spark agent 人工智能 paimon

Java

14372

3570

8 小时前

deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

Java gpu 深度学习 neural-nets deeplearning4j dl4j hadoop Apache Spark IntelliJ IDEA 人工智能 Python Scala Clojure linear-algebra matrix-library

Java

14119

3854

4 天前

wangzhiwubigdata / God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

flink Apache Spark hadoop hdfs hive hbase kafka zookeeper bigdata flume azkaban

10292

3237

2 年前

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

机器学习人工智能 data data-engineering 数据科学 Python elt etl pipelines data-pipelines orchestration data-integration SQL Apache Spark dbt pipeline reverse-etl transformation

Python

8485

874

1 天前

tobymao / sqlglot

Python SQL Parser and Transpiler

transpiler SQL Python Parser optimizer BigQuery duckdb hive MySQL PostgreSQL presto snowflake Apache Spark SQLite trino tsql clickhouse redshift databricks

Python

8389

989

17 小时前

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Apache Spark acid big-data analytics delta-lake

Scala

8306

1930

1 天前

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.