Despite their growing use in medicine, large language models (LLMs) demonstrate limited diagnostic reasoning. We evaluated a two-stage prompting framework with predefined verification steps (Initial ...