Repository navigation

#

big-data

Apache Spark - A unified analytics engine for large-scale data processing

Scala
41693
11 小时前

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python
28469
1 年前
Java
25154
13 小时前
prestodb/presto

The official home of the Presto distributed SQL query engine for big data

Java
16460
4 小时前
Python
14459
1 个月前

PredictionIO, a machine learning server for developers and ML engineers.

Scala
12527
5 年前

CMAK is a tool for managing Apache Kafka clusters

Scala
11928
2 年前

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java
11749
7 小时前

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++
11589
6 天前

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java
10504
4 小时前
quickwit-oss/quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust
10336
12 小时前

The most widely used Python to C compiler

Python
10223
10 小时前

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

C++
8527
12 小时前

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java
8248
7 小时前

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala
8218
3 小时前