Repository navigation

#

apachespark

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook
27514
7 天前
Java
5745
7 小时前

Template for Spark Projects

Scala
101
1 年前

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Python
98
9 个月前

type-class based data cleansing library for Apache Spark SQL

Scala
78
6 年前

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

C
36
1 年前

SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.

Julia
25
1 年前

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

20
3 天前

Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn

Jupyter Notebook
10
3 年前

This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Science].

Jupyter Notebook
9
5 年前

Trigger spark-submit in Golang. A Go implementation of famous SparkLauncher.java.

Go
7
4 年前

Connect to SQL Server using Apache Spark

Scala
7
9 年前

PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.

7
2 年前

Apache Spark with Kafka via JDBC !!!

Java
6
7 年前

Examples usages for cleanframes library

Scala
5
6 年前

Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.

Scala
5
5 年前

Microservices for Spark application

Java
5
2 年前

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

Jupyter Notebook
3
2 年前

Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.

3
1 年前