Repository navigation

#

big-data

Apache Spark - A unified analytics engine for large-scale data processing

Scala
40957
16 小时前

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Python
28110
1 年前
Java
24768
1 天前
prestodb/presto

The official home of the Presto distributed SQL query engine for big data

Java
16308
9 小时前
Python
14226
25 天前

PredictionIO, a machine learning server for developers and ML engineers.

Scala
12525
4 年前

CMAK is a tool for managing Apache Kafka clusters

Scala
11892
2 年前

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++
11265
1 个月前

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java
11164
9 小时前

The most widely used Python to C compiler

Python
9928
4 天前
quickwit-oss/quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust
9922
14 小时前

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

Java
9853
9 小时前

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

C++
8359
10 小时前

Apache Beam is a unified programming model for Batch and Streaming data processing.

Java
8079
9 小时前

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala
7957
9 小时前