Repository navigation

#

pyspark

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Java
3392
4 天前

Implementing best practices for PySpark ETL jobs and applications.

Python
2000
3 年前

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Python
1859
19 天前

A curated list of awesome Apache Spark packages and resources.

Shell
1826
1 年前

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Jupyter Notebook
1664
2 年前

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

Jupyter Notebook
1546
3 年前
Python
1362
25 天前

Lightweight and extensible compatibility layer between dataframe libraries!

Python
1310
4 小时前

PySpark-Tutorial provides basic algorithms using PySpark

Jupyter Notebook
1251
4 个月前
graphframes/graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

Scala
1079
16 小时前

LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.

Rust
983
1 天前

Sparkling Water provides H2O functionality inside Spark cluster

Scala
976
1 个月前

pyspark🍒🥭 is delicious,just eat it!😋😋

Python
820
3 年前

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Vue
812
10 个月前