Repository navigation

#

etl-job

Implementing best practices for PySpark ETL jobs and applications.

Python
1981
3 年前

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

TSQL
278
4 个月前

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

C#
243
1 年前

Terraform modules for provisioning and managing AWS Glue resources

HCL
34
2 个月前

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

HCL
26
4 年前

This repo will guide you step-by-step method to create star schema dimensional model.

Python
25
4 年前

A declarative, SQL-like DSL for data integration tasks.

Go
14
7 年前

Extract transform load CLI tool for extracting small and middle data volume from sources (databases, csv files, xls files, gspreadsheets) to target (databases, csv files, xls files, gspreadsheets) in free combination.

Python
11
3 个月前

Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and transforms the raw data (ETL process) using Apache spark to meet business requirements and also enables Data Analyst create Data Visualization using Superset. Airflow is used to orchestrate the pipeline

Python
9
2 年前

This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.

PHP
9
5 个月前

A simple in-memory, configuration driven, data processing pipeline for Apache Spark.

Scala
5
3 年前

Sentiment Analysis of Tweets Using ETL process and Elastic Search

Python
4
7 年前

Comms processing (ETL) with Apache Flink.

Java
4
5 年前

Telecom ETL is a SSIS package that ingest it's data from CSVs to DB

TSQL
4
3 年前