Repository navigation

apachespark

Website
Wikipedia

DataExpert-io / data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

apachespark Awesome Lists bigdata data dataengineering SQL

Jupyter Notebook

38059

7311

10 天前

apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

hudi apachehudi datalake bigdata apachespark incremental-processing stream-processing data-integration apacheflink

Java

5968

2442

16 小时前

martandsingh / ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

apachespark 数据分析 data-engineering 数据库 databricks datalake deltalake etl-pipeline hadoop hive Apache Spark spark-sql spark-streaming timetravel etl pyspark SQL

Python

103

9 天前

holdenk / sparkProjectTemplate.g8

Template for Spark Projects

apachespark Apache Spark

Scala

102

1 年前

funkyminds / cleanframes

type-class based data cleansing library for Apache Spark SQL

Apache Spark sparksql Scala bigdata apachespark

Scala

6 年前

josephmachado / docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

apachespark Docker Docker Compose pyspark

1 年前

propelledanalytics / SparkSQL.jl

SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.

Apache Spark Julia 语言 apachespark

Julia

2 年前

tspannhw / FLiPStackWeekly

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

apacheflink apachespark cloudera lakehouse streaming

4 天前

aravinthsci / Spark_Delta_Lake

Delta Lake Examples

Apache Spark apachespark delta-lake deltalake datalake

Jupyter Notebook

5 年前

SmartDataAnalytics / MA-INF-4223-DBDA-Lab

Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn

teaching apachespark bigdata semantics 机器学习 RDF (Resource Description Framework)university

Jupyter Notebook

3 年前

SandeepAswathnarayana / professional-certificate-programs

This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Science].

深度学习机器学习 datascience recurrent-neural-networks Python PyTorch Tensorflow pandas NumPy matplotlib SciPy scikit-learn recommender-system restricted-boltzmann-machine seaborn autoencoder image-classification apachespark

Jupyter Notebook

5 年前

CarolinaNicasio / APACHESPARK-PYSPARK-2023

PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.

apache apachespark 数据科学 dataframe Actions pyspark Python Apache Spark

2 年前

datumbrain / gossub

Trigger spark-submit in Golang. A Go implementation of famous SparkLauncher.java.

Apache Spark apachespark Go

5 年前

sfrechette / spark-jdbc-mssql

Connect to SQL Server using Apache Spark

sql-server jdbc-driver Apache Spark Scala apachespark

Scala

9 年前

lensesio / lenses-jdbc-spark

Apache Spark with Kafka via JDBC !!!

kafka apachespark jdbc-driver

Java

7 年前

funkyminds / cleanframes-examples

Examples usages for cleanframes library

Apache Spark sparksql bigdata Scala apachespark

Scala

6 年前

sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark

Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.

Scala Amazon Web Services emr apachespark dataframes s3 bigdata big-data big-data-analytics databricks

Scala

6 年前