ChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questions

Buhr, Christoph Raphael; Smith, Harry; Huppertz, Tilman; Bahr-Hamm, Katharina; Matthias, Christoph; Blaikie, Andrew; Kelsey, Tom; Kuhn, Sebastian; Eckrich, Jonas

doi:http://doi.org/10.25358/openscience-10079

ChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questions

dc.contributor.author	Buhr, Christoph Raphael
dc.contributor.author	Smith, Harry
dc.contributor.author	Huppertz, Tilman
dc.contributor.author	Bahr-Hamm, Katharina
dc.contributor.author	Matthias, Christoph
dc.contributor.author	Blaikie, Andrew
dc.contributor.author	Kelsey, Tom
dc.contributor.author	Kuhn, Sebastian
dc.contributor.author	Eckrich, Jonas
dc.date.accessioned	2024-02-20T08:41:15Z
dc.date.available	2024-02-20T08:41:15Z
dc.date.issued	2023
dc.description.abstract	Background: Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more “consultations” of LLMs about personal medical symptoms. Objective: This study aims to evaluate ChatGPT’s performance in answering clinical case–based questions in otorhinolaryngology (ORL) in comparison to ORL consultants’ answers. Methods: We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Results: Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT’s scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT’s answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001). Conclusions: While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants’ answers. LLMs have potential as augmentative tools for medical care, but their “consultation” for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits.	en_GB
dc.identifier.doi	http://doi.org/10.25358/openscience-10079
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/10097
dc.language.iso	eng	de
dc.rights	CC-BY-4.0	*
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	*
dc.subject.ddc	610 Medizin	de_DE
dc.subject.ddc	610 Medical sciences	en_GB
dc.title	ChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questions	en_GB
dc.type	Zeitschriftenaufsatz	de
jgu.apc.netprice	2000,14	de
jgu.apc.price	2140,15	de
jgu.apc.taxrate	7	de
jgu.dfg.year	2023
jgu.journal.title	JMIR medical education	de
jgu.journal.volume	9	de
jgu.nationalcurrency.usd	2123,95
jgu.organisation.department	FB 04 Medizin	de
jgu.organisation.name	Johannes Gutenberg-Universität Mainz
jgu.organisation.number	2700
jgu.organisation.place	Mainz
jgu.organisation.ror	https://ror.org/023b0x485
jgu.pages.alternative	e49183	de
jgu.publisher.doi	10.2196/49183	de
jgu.publisher.issn	2369-3762	de
jgu.publisher.name	JMIR Publications	de
jgu.publisher.place	Toronto	de
jgu.publisher.year	2023
jgu.rights.accessrights	openAccess
jgu.subject.ddccode	610	de
jgu.subject.dfg	Lebenswissenschaften	de
jgu.type.contenttype	Scientific article	de
jgu.type.dinitype	Article	en_GB
jgu.type.resource	Text	de
jgu.type.version	Published version	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: chatgpt_versus_consultants__b-20240214142515216.pdf
Size:: 365.68 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.57 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

DFG-491381577-G