Repository navigation
Apache Spark
Created by Matei Zaharia
发布于 May 26, 2014
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- 维基百科
相关主题
Scala
Apache Spark 是一个开源分布式通用集群计算框架。
相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中,Spark使用了存储器内运算技术,能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。
Apache Spark - A unified analytics engine for large-scale data processing
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Learn and understand Docker&Container technologies, with real DevOps practice!
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
List of Data Science Cheatsheets to rule the world
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
Apache Doris is an easy-to-use, high performance and unified analytics database.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Python SQL Parser and Transpiler
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio, data orchestration for analytics and machine learning in the cloud
A Flexible and Powerful Parameter Server for large-scale machine learning