Assessing unknown potential : quality and limitations of different large language models in the field of otorhinolaryngology

Buhr, Christoph R.; Smith, Harry; Huppertz, Tilman; Bahr-Hamm, Katharina; Matthias, Christoph; Cuny, Clemens; Snijders, Jan Phillipp; Ernst, Benjamin Philipp; Blaikie, Andrew; Kelsey, Tom; Kuhn, Sebastian; Eckrich, Jonas

doi:http://doi.org/10.25358/openscience-11041

Assessing unknown potential : quality and limitations of different large language models in the field of otorhinolaryngology

dc.contributor.author	Buhr, Christoph R.
dc.contributor.author	Smith, Harry
dc.contributor.author	Huppertz, Tilman
dc.contributor.author	Bahr-Hamm, Katharina
dc.contributor.author	Matthias, Christoph
dc.contributor.author	Cuny, Clemens
dc.contributor.author	Snijders, Jan Phillipp
dc.contributor.author	Ernst, Benjamin Philipp
dc.contributor.author	Blaikie, Andrew
dc.contributor.author	Kelsey, Tom
dc.contributor.author	Kuhn, Sebastian
dc.contributor.author	Eckrich, Jonas
dc.date.accessioned	2024-12-03T12:57:51Z
dc.date.available	2024-12-03T12:57:51Z
dc.date.issued	2024
dc.description.abstract	Background: Large Language Models (LLMs) might offer a solution for the lack of trained health personnel, particularly in low- and middle-income countries. However, their strengths and weaknesses remain unclear. Aims/objectives: Here we benchmark different LLMs (Bard 2023.07.13, Claude 2, ChatGPT 4) against six consultants in otorhinolaryngology (ORL). Material and methods: Case-based questions were extracted from literature and German state examinations. Answers from Bard 2023.07.13, Claude 2, ChatGPT 4, and six ORL consultants were rated blindly on a 6-point Likert-scale for medical adequacy, comprehensibility, coherence, and conciseness. Given answers were compared to validated answers and evaluated for hazards. A modified Turing test was performed and character counts were compared. Results: LLMs answers ranked inferior to consultants in all categories. Yet, the difference between consultants and LLMs was marginal, with the clearest disparity in conciseness and the smallest in comprehensibility. Among LL	en_GB
dc.identifier.doi	http://doi.org/10.25358/openscience-11041
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/11060
dc.language.iso	eng	de
dc.rights	CC-BY-4.0	*
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	*
dc.subject.ddc	610 Medizin	de_DE
dc.subject.ddc	610 Medical sciences	en_GB
dc.title	Assessing unknown potential : quality and limitations of different large language models in the field of otorhinolaryngology	en_GB
dc.type	Zeitschriftenaufsatz	de
jgu.apc.netprice	0	de
jgu.apc.price	0	de
jgu.apc.taxrate	19	de
jgu.apc.transformationcontract	Taylor Francis	de
jgu.dfg.year	2024
jgu.journal.issue	3	de
jgu.journal.title	Acta oto-laryngologica	de
jgu.journal.volume	44	de
jgu.nationalcurrency.eur	0
jgu.organisation.department	FB 04 Medizin	de
jgu.organisation.name	Johannes Gutenberg-Universität Mainz
jgu.organisation.number	2700
jgu.organisation.place	Mainz
jgu.organisation.ror	https://ror.org/023b0x485
jgu.pages.end	242	de
jgu.pages.start	237	de
jgu.publisher.doi	10.1080/00016489.2024.2352843	de
jgu.publisher.issn	1651-2251	de
jgu.publisher.name	Taylor & Francis	de
jgu.publisher.place	Stockholm	de
jgu.publisher.year	2024
jgu.rights.accessrights	openAccess
jgu.subject.ddccode	610	de
jgu.subject.dfg	Lebenswissenschaften	de
jgu.type.dinitype	Article	en_GB
jgu.type.resource	Text	de
jgu.type.version	Published version	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: assessing_unknown_potential__-20241203135652757.pdf
Size:: 1.11 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.57 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

DFG-491381577-H