π Please cite our paper if you are using CES in your work π
@article{liu2025assessing,
title={Assessing Coherency and Consistency of Code Execution Reasoning by Large Language Models},
author={Liu, Changshu and Chen, Yang and Jabbarvand, Reyhaneh},
journal={arXiv preprint arXiv:2510.15079},
year={2025}
}
Reasoning Coherency
| # Ranking |
LLM |
Coherent Reasoning & Correct Output (%) βΎ |
Coherent Reasoning & Incorrect Output (%) βΎ |
Incoherent Reasoning & Correct Output (%) βΎ |
Incoherent Reasoning & Incorrect Output (%) βΎ |
Reasoning Consistency
| # Ranking |
LLM |
Strong Reasoning |
Weak Reasoning |
Random Reasoning |