Repository navigation

#

humaneval

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Python
841
9 个月前

Run evaluation on LLMs using human-eval benchmark

Python
407
2 年前

SkyCode是一个多语言开源编程大模型,采用GPT3模型结构,支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,拥有强大解题能力,使您从编程中解放出来,专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts the GPT3 model structure. It supports Java, JavaScript, C, C++, Python, Go, shell and other languages, and can understand Chinese comments.

394
2 年前

Evaluate LLM-generated COBOL

Python
35
1 年前

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

Python
9
6 个月前

LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage

Java
0
1 年前

JetBrains Task: Leveraging software evolution data with LLMs

0
1 年前

llm_benchmark is a comprehensive benchmarking tool for evaluating the performance of various Large Language Models (LLMs) on a range of natural language processing tasks. It provides a standardized framework for comparing different models based on accuracy, speed, and efficiency.

0
2 个月前