Repository navigation

#

google-cloud-dataproc

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Go
3001
2 天前

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.

Go
658
3 年前

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster

Shell
598
22 天前

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Java
406
7 天前

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

Java
285
6 小时前

Cloud Dataproc: Samples and Utils

Jupyter Notebook
204
2 个月前

Tools for creating Dataproc custom images

Python
34
2 个月前

A sample demo to check latest spark, big query connector and scala 2.12

Scala
1
4 年前

Sua missão será criar um ecossistema de Big Data usando o Google Cloud Platform (GCP). Para isso, o expert te ensinará a configurar o Google Cloud Dataproc, um Hadoop totalmente gerenciado, usando seus créditos gratuitos da GCP.

Python
0
4 年前

Streaming JSON data to Spark or Google Cloud Dataproc.

Python
0
2 年前

Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One

Shell
0
4 年前

This project explores the core concepts of distributed data processing using the MapReduce programming model , implemented with Python via Hadoop Streaming , and deployed on a multi-node Google Cloud Dataproc cluster.

Python
0
1 个月前