Repository navigation

#

pyspark

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Java
3353
8 天前

Implementing best practices for PySpark ETL jobs and applications.

Python
1891
2 年前

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Python
1831
1 年前

A curated list of awesome Apache Spark packages and resources.

Shell
1785
6 个月前

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Jupyter Notebook
1647
1 年前

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

Jupyter Notebook
1421
3 年前
Python
1350
2 个月前

PySpark-Tutorial provides basic algorithms using PySpark

Jupyter Notebook
1218
3 个月前

Sparkling Water provides H2O functionality inside Spark cluster

Scala
970
5 个月前

Lightweight and extensible compatibility layer between dataframe libraries!

Python
931
12 小时前

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Vue
810
4 个月前

pyspark🍒🥭 is delicious,just eat it!😋😋

Python
799
3 年前

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Python
795
1 个月前

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

JavaScript
793
3 年前