Repository navigation
pyspark
- Website
- Wikipedia
the portable Python dataframe library
Simple and Distributed Machine Learning
State of the Art Natural Language Processing
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Implementing best practices for PySpark ETL jobs and applications.
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
A curated list of awesome Apache Spark packages and resources.
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Jupyter magics and kernels for working with remote Spark clusters
Hopsworks - Data-Intensive AI platform with a Feature Store
PySpark-Tutorial provides basic algorithms using PySpark
MapReduce, Spark, Java, and Scala for Data Algorithms Book
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Sparkling Water provides H2O functionality inside Spark cluster
pyspark🍒🥭 is delicious,just eat it!😋😋