Repository navigation

big-data

Website
Wikipedia

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

system-design 后端 scalability 面试 architecture DevOps design-patterns Awesome Lists big-data Hackathon-Kit lists web-development 编程 system interview-practice 计算机科学 distributed-systems 机器学习

65751

6631

16 小时前

ClickHouse / ClickHouse

ClickHouse® is a real-time analytics database management system

dbms olap analytics SQL big-data mpp clickhouse Hacktoberfest C++Rust 人工智能 cloud-native 数据库 distributed embedded lakehouse 自托管

C++

43171

7682

4 小时前

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Python Scala R Java big-data jdbc SQL Apache Spark

Scala

42014

28863

2 天前

donnemartin / data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python 机器学习深度学习数据科学 big-data Amazon Web Services Tensorflow theano caffe scikit-learn kaggle Apache Spark mapreduce hadoop matplotlib pandas NumPy SciPy Keras

Python

28574

8019

2 年前

apache / flink

Apache Flink

Scala Java big-data flink Python SQL

Java

25324

13807

2 天前

amark / gun

An open source cybersecurity protocol for syncing decentralized graph data.

机器学习人工智能 big-data 区块链 P2P decentralized graph Cryptography crypto offline-first realtime crdt Protocol (disambiguation)数据库 end-to-end encryption dweb dapp web3 metaverse

JavaScript

18658

1216

2 个月前

heibaiying / BigData-Notes

大数据入门指南 ⭐

hadoop hdfs Yarn mapreduce hive Apache Spark storm hbase Scala kafka zookeeper flume azkaban sqoop phoenix bigdata big-data

Java

16683

4306

2 年前

prestodb / presto

The official home of the Presto distributed SQL query engine for big data

Java presto hive hadoop big-data SQL data lakehouse Query (disambiguation)

Java

16522

5498

1 天前

andkret / Cookbook

The Data Engineering Cookbook

data-engineer data-engineering big-data best-practices cookbook

Python

14534

2636

2 个月前

apache / predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Scala big-data

Scala

12531

1921

5 年前

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java presto hive hadoop big-data SQL prestodb 数据库 distributed-systems distributed-database 数据科学 datalake jdbc query-engine trino analytics delta-lake iceberg

Java

11985

3349

3 小时前

yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters

kafka Scala cluster-management big-data

Scala

11935

2499

2 年前

vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

graph-database distributed 数据库 graphdb raft C++NebulaGraph nebula graph nebulagraph big-data distributed-systems scalability Hacktoberfest

C++

11720

1265

1 个月前

provectus / kafka-ui

Open-Source Web UI for Apache Kafka Management

kafka-ui kafka-brokers kafka kafka-streams kafka-client Open Source kafka-connect kafka-producer streams big-data apache-kafka cluster-management web-ui kafka-manager kafka-cluster streaming-data event-streaming Hacktoberfest

Java

11423

1328

1 年前

StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java

10727

2151

21 小时前

quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust log-management logs tantivy cloud-native Open Source big-data cloud-storage distributed-tracing search-engine

Rust

10440

488

1 天前

cython / cython

The most widely used Python to C compiler

Python cython cpython cpython-extensions C C++performance big-data

Python

10319

1568

1 天前

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

机器学习 decision-trees gradient-boosting gbm gbdt Python R kaggle gpu-computing catboost 教程 categorical-features gpu coreml 数据科学 big-data CUDA data-mining

C++

8602

1239

3 小时前

apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.

Python Java big-data beam batch Go SQL streaming

Java

8315

4412

6 小时前

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Apache Spark acid big-data analytics delta-lake

Scala

8306

1931

1 天前