Repository navigation
#
swe-bench
- Website
- Wikipedia
AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
Rust
3204
16 天前
We track and analyze the activity and performance of autonomous code agents in the wild
TypeScript
31
1 个月前
This project explores how Large Language Models (LLMs) perform on real-world software engineering tasks, inspired by the SWE-Bench benchmark. Using locally hosted models like Llama 3 via Ollama, the tool evaluates code repair capabilities on Python repositories through custom test cases and a lightweight scoring framework.
TeX
0
6 个月前