Repository navigation

#

multihead-attention

Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.

Jupyter Notebook
182
2 年前

A Faster Pytorch Implementation of Multi-Head Self-Attention

Jupyter Notebook
74
3 年前

Flexible Python library providing building blocks (layers) for reproducible Transformers research (Tensorflow ✅, Pytorch 🔜, and Jax 🔜)

Python
53
1 年前

여러가지 유명한 신경망 모델들을 제공합니다. (DCGAN, VAE, Resnet 등등)

Python
50
4 年前

Implementation of "Attention is All You Need" paper

Python
33
9 个月前

Semantic segmentation is an important job in computer vision, and its applications have grown in popularity over the last decade.We grouped the publications that used various forms of segmentation in this repository. Particularly, every paper is built on a transformer.

32
1 个月前

Chatbot using Tensorflow (Model is transformer) ko

Python
29
6 年前

Joint text classification on multiple levels with multiple labels, using a multi-head attention mechanism to wire two prediction tasks together.

Python
16
5 年前

Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product.

Python
12
4 个月前

An experimental project for autonomous vehicle driving perception with steering angle prediction and semantic segmentation using a combination of UNet, attention and transformers.

Python
10
4 年前

This repository contains the code for the paper "Attention Is All You Need" i.e The Transformer.

Jupyter Notebook
8
2 年前

The implementation of transformer as presented in the paper "Attention is all you need" from scratch.

Python
8
2 年前

Simple GPT with multiheaded attention for char level tokens, inspired from Andrej Karpathy's video lectures : https://github.com/karpathy/ng-video-lecture

Jupyter Notebook
5
2 年前
Python
5
18 天前

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

Cuda
5
2 个月前

This package is a Tensorflow2/Keras implementation for Graph Attention Network embeddings and also provides a Trainable layer for Multihead Graph Attention.

Python
3
4 年前

Annotated vanilla implementation in PyTorch of the Transformer model introduced in 'Attention Is All You Need'.

Jupyter Notebook
1
1 年前

Testing the Reproducibility of the paper: MixSeq. Under the assumption that macroscopic time series follow a mixture distribution, they hypothesise that lower variance of constituting latent mixture components could improve the estimation of macroscopic time series.

Jupyter Notebook
1
2 年前