Repository navigation

Apache Spark

Created by Matei Zaharia

发布于 May 26, 2014

Repository: apache/spark
Website: spark.apache.org
Wikipedia: 维基百科

相关主题

Apache Spark 是一个开源分布式通用集群计算框架。

相对于Hadoop的MapReduce会在执行完工作后将中介资料存放到磁盘中，Spark使用了存储器内运算技术，能在资料尚未写入硬盘时即在存储器内分析运算。Spark在存储器内执行程序的运算速度能做到比Hadoop MapReduce的运算速度快上100倍。

mlflow / mlflow

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

机器学习人工智能 mlflow Apache Spark model-management agentops agents evaluation langchain llm-evaluation llmops observability Open Source openai prompt-engineering ai-governance mlops

Python

22334

4872

15 小时前

microsoft / SynapseML

Simple and Distributed Machine Learning

Apache Spark pyspark Azure Scala Microsoft 机器学习 databricks cognitive-services lightgbm HTTP model-deployment 深度学习人工智能数据科学 synapse big-data onnx OpenCV

Scala

5169

851

2 天前

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

data-engineering data-versioning Go object-storage data-lake aws-s3 data-quality azure-blob-storage google-cloud-storage git-for-data Apache Spark hadoop-filesystem datalake data-version-control azure-storage

4904

400

3 天前

lw-lin / CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Apache Spark spark-streaming

Scala

3489

1405

3 年前

spark-notebook / spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

Apache Spark notebook Scala 数据科学 reactive

JavaScript

3151

653

2 年前

kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Kubernetes kubernetes-operator Apache Spark kubernetes-crd kubernetes-controller google-cloud-dataproc

3032

1429

5 天前

intel / BigDL

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Apache Spark 深度神经网络 distributed-deep-learning keras-tensorflow bigdl analytics-zoo Python Scala PyTorch

Jupyter Notebook

2686

732

23 天前

dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Apache Spark C#.NET analytics bigdata spark-streaming spark-sql 机器学习 F#dotnet-standard streaming Azure hdinsight databricks emr Microsoft

2078

328

11 天前

big-data-europe / docker-spark

Apache Spark docker image

Kubernetes Docker Apache Spark

Shell

2057

702

2 年前

feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

feature-engineering feature-store 人工智能 mlops data-engineering data-quality 机器学习 Apache Spark Azure 数据科学 feature-management

Scala

1908

235

2 年前

awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

Apache Spark pyspark Awesome Lists

Shell

1826

340

1 年前

OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Apache Spark 机器学习 kafka apache-kafka Java cloudera

Java

1785

404

4 年前

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

SQL MySQL exercises 数据分析 PostgreSQL SQLite tableau challenges sql-queries Python pyspark Apache Spark

Jupyter Notebook

1546

565

3 年前

japila-books / apache-spark-internals

The Internals of Apache Spark

Apache Spark book internals

1516

459

3 个月前

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

etl-pipeline etl-framework Apache Spark apache-airflow airflow redshift emr-cluster livy s3 data-lake scheduler data-migration data-engineering data-engineering-pipeline Python etl-job

Python

1422

238

6 年前

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Apache Spark spark-sql spark-mllib mlflow delta-lake

Scala

1345

784

8 个月前

lensacom / sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

scikit-learn Apache Spark 机器学习 distributed-computing Python

Python

1155

256

5 年前

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

hadoop-mapreduce Java distributed-computing Scala mapreduce Python 机器学习 pyspark Apache Spark design-patterns

Java

1081

659

1 年前

graphframes / graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

Apache Spark big-data dataframe dataframes graphs networks pyspark

Scala

1079

253

19 小时前

databricks / spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Apache Spark scikit-learn parameter-tuning 机器学习

Python

1078

228

6 年前