Hugging Face Releases BigCodeBench to Replace Outdated HumanEval Benchmark
BigCodeBench introduces 1,140 complex tasks to evaluate LLM coding capabilities, addressing data contamination and the simplicity of legacy benchmarks.
BigCodeBench introduces 1,140 complex tasks to evaluate LLM coding capabilities, addressing data contamination and the simplicity of legacy benchmarks.