Methodology: case creation, standards alignment, and feedback generation
Jacqueline Ponczek
Methodology Advisor, ClinicalSim.ai
This page outlines ClinicalSim case development, communication and governing-body framework alignment, scoring, and how each encounter generates high-quality, actionable feedback. A single engine, rubric, and dashboard serve learners across the medical-education continuum, and every session produces timestamped, competency-based documentation for learners, faculty, and program leadership.
Key takeaway: every ClinicalSim case is anchored to a specific, published competency or communication standard, and every score traces to a verbatim excerpt from the encounter transcript — never to an unexplained rating.
1. Purpose and scope
Every case is anchored to the relevant governing body’s framework for the learner’s level. This anchoring holds regardless of a program’s chosen primary measure. A program may adopt an internal or externally validated tool as its primary focus, or incorporate ClinicalSim cases into a broader curriculum. Each case’s scoring and feedback are always grounded in a specific, published standard.
Sharing our methodology keeps our work transparent, so those who rely on it can trust it. Three commitments anchor it: quality, because every case is built from primary sources; consistency, because the same scoring logic applies to every case; and alignment, because every score traces to a published competency or a validated communication framework.
Quality
Every case is built from primary sources, written to a defined purpose, and reviewed by practicing physicians with strong academic backgrounds before release.
Consistency
The same scoring logic applies to every case, regardless of specialty, learner level, or which communication frameworks are applied alongside it.
Alignment
Every score traces to a published competency or a validated communication framework — never to an unexplained rating.
2. How the methodology works
The method below applies to every case, regardless of learner level. Within Evidence and scoring, the level subsections describe what varies by learner level.
2.1 Building a case
Every case begins with a defined purpose: the communication and clinical skills it should exercise and the competencies it should assess. Content is written to that purpose, with explicit learning objectives and a clinical evidence base drawn from foundational and other applicable literature. Physicians then review each case for accuracy, content, alignment, and fit to its objectives; reviewers are practicing physicians with strong academic backgrounds and decades of collective experience, including program directors, simulation facilitators, and UME and GME educators. Before release, each case is run repeatedly to confirm three things: that the AI character convincingly plays the role the case requires; that scoring and feedback perform as intended; and that what the case asks can be assessed within the limits of voice-based simulation. Refinements are made in coordination with ClinicalSim’s clinical and technical leadership.
2.2 Competency alignment and communication frameworks
Three terms recur here. The competency framework is the anchor for the competency assessment; where it defines level descriptors, as the ACGME milestones do, those are quoted verbatim from the primary source. The communication frameworks are then applied to characterize how the learner communicated. The two are distinct: the competency score reflects the learner’s developmental level, while the communication frameworks capture the specific skills underlying communication technique.
Competency framework
The governing-body standard a case is assessed against — the ACGME Milestones 2.0 in graduate medical education, or the Foundational Competencies in undergraduate medical education.
Communication framework
A validated, published model of communication behavior, such as SPIKES or Calgary-Cambridge, applied to characterize how the learner communicated.
Rubric
The scored instrument that turns a framework into rated items — including a program's own internal or externally validated tools.
Each communication framework is drawn from a validated library with an approved citation and serves as a floor, not a ceiling: one or more may be applied to a case, each scored independently, and programs may add their own internal or externally validated rubrics. Because these frameworks and rubrics operate at different scopes, from whole-encounter structures to task-specific routines to discrete micro-skills, ClinicalSim selects those best suited to each case’s communication task.
2.3 Evidence and scoring
Each encounter is a voice conversation between the learner and an AI role designed for the case, captured as a timestamped transcript. For every scored competency and framework step, the platform draws one or two verbatim excerpts that demonstrate the behavior, or documents its absence. Because each score is traceable to the moment that supports it, the output withstands review rather than serving as an unexplained rating.
Scoring follows the competency framework on which a case is built, and the unit of assessment is the individual competency the case exercises. Each applied communication framework or program rubric is scored independently of the competency, and may follow the framework’s discrete steps, with a Likert scale where finer resolution is useful. Because these frameworks are developmental, a given result carries different meaning at different stages of training and is always interpreted accordingly. All scores are presented together, with their verbatim evidence, so the learner or reviewer sees a complete picture. What varies is the competency framework a case is anchored to and how the competency itself is scored, described by learner level below.
Graduate medical education
Cases align to the specialty-specific ACGME Milestones 2.0, with milestone text quoted verbatim from each specialty’s own document, and target the high-stakes conversations a specialty most needs to rehearse. The Milestones 2.0 describe six core competencies across five developmental levels; several were harmonized across specialties in 2017 and then adapted by each specialty, which is why text is drawn from the specialty’s own version.
Scoring reflects whichever subcompetencies the scenario exercises, most often interpersonal and communication skills and professionalism, and systems-based practice or other domains where the encounter warrants. Each is scored on the Dreyfus scale (1 to 5), read against the milestone’s verbatim level descriptors:
- Level 1, Novice
- Level 2, Advanced Beginner
- Level 3, Competent
- Level 4, Proficient (readiness for unsupervised practice)
- Level 5, Expert (aspirational)
The result is milestone-placed and ready for Clinical Competency Committee review. Because the milestones are formative and were not designed for high-stakes external decisions, ClinicalSim treats milestone-aligned output accordingly, as evidence that informs program judgment.
Undergraduate medical education
Cases align to the Foundational Competencies for Undergraduate Medical Education (AAMC, AACOM, and ACGME) and the AAMC Core Entrustable Professional Activities (EPAs) for Entering Residency. The Core EPAs were originally mapped to the Physician Competency Reference Set (PCRS, 2013), which the 2024 Foundational Competencies now supersede; an updated set of EPAs aligned to the Foundational Competencies is anticipated but not yet published. Until it is, ClinicalSim maps UME cases to the EPAs and to the Foundational Competencies independently, without asserting a fixed crosswalk between them.
The Foundational Competencies are not published with the milestones’ five-level scale, so for UME ClinicalSim does not assign a numeric level; it records whether each competency was demonstrated and scores performance through the applied communication or skill rubric. Entrustment, the pre-entrustable to entrustable judgment, remains a program decision that this evidence informs.
Development emphasizes foundational encounters that mature alongside clinical knowledge, from history-taking to delivering a diagnosis, preparing students for the transition to residency.
Faculty development
The same case-building and assessment methods extend to the conversations faculty are expected to model, including delivering difficult feedback, navigating professionalism concerns, and bedside or small-group teaching.
2.4 Feedback
Each encounter produces a single feedback report.
Verbatim evidence is incorporated into the grading rubrics, justifying the level a learner reached or the specific step assessed. The report then offers an overall impression (strengths, priority gaps, and top action items) and targeted recommendations. Depending on the case, it indicates where a learner sits developmentally and provides reviewers with transcript-grounded evidence for decisions about progression, remediation, readiness for practice, readiness to perform a particular task, or familiarity with a given subject area.
3. Commitment to accuracy
We are committed to accuracy and to fidelity to the source documents behind every case. Each result is a transparent statement of the evidence in the encounter: it informs the learner and the reviewer, and it never replaces final human judgment.
4. References
Graduate medical education
- Edgar L, Roberts S, Holmboe E. Milestones 2.0: A Step Forward. J Grad Med Educ. 2018;10(3):367-369.
- Morrison LJ, Joyce BL, Meyer LE, et al. Strengthening Interpersonal and Communication Skills Assessment Through Harmonized Milestones. J Grad Med Educ.
- ACGME. Use of Individual Milestones Data by External Entities for High-Stakes Decisions: A Function for Which They Are Not Designed or Intended. October 2022.
- Specialty-specific ACGME Milestones and Supplemental Guides, sourced per specialty.
Undergraduate medical education
- AAMC, AACOM, and ACGME. Foundational Competencies for Undergraduate Medical Education. 2024.
- AAMC. The Core Entrustable Professional Activities (EPAs) for Entering Residency. 2014. aamc.org
Communication frameworks (representative; full citations in the ClinicalSim Frameworks Bibliography)
- SPIKES. Baile WF, et al. The Oncologist. 2000;5(4):302-311.
- KEECC-A (Kalamazoo). Makoul G. Acad Med. 2001;76(4):390-393.
- SEGUE. Makoul G. Patient Educ Couns. 2001;45(1):23-34.
- NURSE. Back AL, et al. CA Cancer J Clin. 2005;55(3):164-177.
- REMAP. Childers JW, et al. J Oncol Pract. 2017;13(10):e844-e850.
- SBAR. Haig KM, et al. Jt Comm J Qual Patient Saf. 2006;32(3):167-175.
- I-PASS. Starmer AJ, et al. Pediatrics. 2012;129(2):201-204.
- TeamSTEPPS. King HB, et al. AHRQ; 2008. CANDOR. AHRQ; updated 2023.
- Calgary-Cambridge. Kurtz S, Silverman J, Draper J. 2nd ed. Radcliffe; 2005.