Repository navigation

#

data-processing

onceupon/Bash-Oneliner

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

10432
16 天前

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Go
7408
22 天前
NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++
5361
16 小时前

A lightweight data processing framework built on DuckDB and 3FS.

Python
4552
1 个月前
dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir
2510
15 天前

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Python
2388
4 年前

Kubernetes-native platform to run massively parallel data/streaming jobs

Go
1862
11 小时前

Extract Transform Load for Python 3.5+

Python
1590
2 年前

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Jupyter Notebook
1383
1 个月前
allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

Python
1198
4 天前

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

856
4 年前

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Python
745
3 年前