DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
Published in ICLR 2026
Fan Shu†, Yite Wang†, Ruofan Wu, Boyi Liu, Zhewei Yao, Yuxiong He, and Feng Yan.
DARE-bench introduces a verifiable benchmark for machine learning modeling and data science instruction-following, with 6,300 Kaggle-derived tasks for both training and evaluation.
