You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Wei-Lin Chiang 3f61c6e6fc
Upgrade gradio to 4.17 (#3027)
3 months ago
..
data Mt bench plot (#2068) 9 months ago
README.md update mt-bench readme 4 months ago
clean_judgment.py Use single-answer grading as the default option for LLM judge (#1892) 10 months ago
common.py Fix type hint for play_a_match_single (#3008) 3 months ago
compute_agreement.py Add compute agreement (#1855) 10 months ago
download_mt_bench_pregenerated.py Use single-answer grading as the default option for LLM judge (#1892) 10 months ago
gen_api_answer.py fix: 'compeletion' typo (#2847) 4 months ago
gen_judgment.py Revert "fix: llm_judge resume from breakpoint when judging" (#2334) 8 months ago
gen_model_answer.py Add revision arg to MT Bench answer generation (#2728) 5 months ago
qa_browser.py Upgrade gradio to 4.17 (#3027) 3 months ago
show_result.py drop scores for API error judgments (#2074) 9 months ago