Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://doi.org/10.25358/openscience-10079
Autoren: Buhr, Christoph Raphael
Smith, Harry
Huppertz, Tilman
Bahr-Hamm, Katharina
Matthias, Christoph
Blaikie, Andrew
Kelsey, Tom
Kuhn, Sebastian
Eckrich, Jonas
Titel: ChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questions
Online-Publikationsdatum: 20-Feb-2024
Erscheinungsdatum: 2023
Sprache des Dokuments: Englisch
Zusammenfassung/Abstract: Background: Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more “consultations” of LLMs about personal medical symptoms. Objective: This study aims to evaluate ChatGPT’s performance in answering clinical case–based questions in otorhinolaryngology (ORL) in comparison to ORL consultants’ answers. Methods: We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Results: Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT’s scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT’s answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001). Conclusions: While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants’ answers. LLMs have potential as augmentative tools for medical care, but their “consultation” for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits.
DDC-Sachgruppe: 610 Medizin
610 Medical sciences
Veröffentlichende Institution: Johannes Gutenberg-Universität Mainz
Organisationseinheit: FB 04 Medizin
Veröffentlichungsort: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-10079
Version: Published version
Publikationstyp: Zeitschriftenaufsatz
Weitere Angaben zur Dokumentart: Scientific article
Nutzungsrechte: CC BY
Informationen zu den Nutzungsrechten: https://creativecommons.org/licenses/by/4.0/
Zeitschrift: JMIR medical education
9
Seitenzahl oder Artikelnummer: e49183
Verlag: JMIR Publications
Verlagsort: Toronto
Erscheinungsdatum: 2023
ISSN: 2369-3762
DOI der Originalveröffentlichung: 10.2196/49183
Enthalten in den Sammlungen:DFG-491381577-G

Dateien zu dieser Ressource:
  Datei Beschreibung GrößeFormat
Miniaturbild
chatgpt_versus_consultants__b-20240214142515216.pdf365.68 kBAdobe PDFÖffnen/Anzeigen