Repository navigation

massive-datasets

Website
Wikipedia

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

horizontal-scaling distributed-transactions htap enterprise-class cloud-native high-availability MySQL high-concurrency massive-datasets relational-database

Java

1636

333

1 个月前

helmholtz-analytics / heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

gpu tensors distributed 机器学习 mpi NumPy Python PyTorch data-analytics data-processing 数据科学 hpc massive-datasets parallelism

Python

223

5 天前

polardb / polardbx

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

MySQL distributed-transactions cloud-native high-availability relational-databases high-concurrency massive-datasets htap horizontal-scaling enterprise-class

Makefile

1 个月前

joshuaboud / gen-dataset

Command line tool to quickly generate a lot of files in a lot of directories

Linux dataset dataset-generation benchmarking massive-datasets cli-tool multithreading evaluation

C++

4 年前

rajeshidumalla / Bloom-Filter

Building a Bloom Filter on English dictionary words

bloom-filter massive-datasets Python 数据科学机器学习数据分析

Jupyter Notebook

4 年前

FedericoBruzzone / anti-money-laundering

The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.

机器学习 massive-datasets pyspark

Jupyter Notebook

1 年前

rajeshidumalla / PageRank

Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library

pagerank-algorithm 机器学习 massive-datasets 数据分析数据科学 Python Apache Spark pandas NumPy

Jupyter Notebook

4 年前

FedericoBruzzone / algorithms-for-massive-datasets

This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"

算法深度学习 massive-datasets recommender-system

TeX

1 年前

gmalik9 / floating_point_data_compressor

gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data

compression compressor autoencoder floating-point massive-datasets 数据可视化 data-compression representation representation-learning

Python

8 年前

datakaveri / k-anonymisation-SKALD

Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

chunking encoding massive-datasets

Python

5 天前

StefanoBalbo / Geocoding

Automated massive geolocator of addresses with parallel processing.

Docker geocoding geolocation geopandas geospatial Jupyter Notebook jupyterlab massive-datasets massively-parallel nominatim osm Python spatial-analysis ssh-server

Jupyter Notebook

6 个月前

Alex4gtx / Massive-Data-Handler

Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.

big-data dictionaries python-script massive-datasets

Python

4 年前