Can ChatGPT be a diabetes consultant? Study probes the potential and pitfalls

In a recent study published in the journal PLoS ONE, researchers tested chatGPT, a language model geared for discussion, to investigate whether it could answer frequently asked diabetes questions.

Artificial intelligence (AI), particularly ChatGPT, has gained significant attention for its potential clinical applications. Despite not being trained explicitly for this domain, ChatGPT has millions of active users globally. Studies have reported that individuals are more amenable to AI-based solutions for low-risk scenarios, with greater acceptance rates. This necessitates more study into the understanding and use of large language-based models like ChatGPT in routine circumstances and regular clinical treatment.

​​​​​​​Study: ChatGPT- versus human-generated answers to frequently asked questions about diabetes: A Turing test-inspired survey among employees of a Danish diabetes center. ​​​​​​​Image Credit: Andrey_Popov / Shutterstock

About the study

In the present study, researchers evaluated ChatGPT's expertise in diabetes, especially the capacity to answer commonly requested questions related to diabetes in a similar manner as humans.

The researchers specifically explored whether participants with diabetes expertise ranging from some to expert could distinguish between replies provided by people and those written by ChatGPT to answer common queries regarding diabetes. Furthermore, the researchers explored whether individuals with prior interactions with diabetes patients as health providers and individuals who had previously used ChatGPT were better at detecting ChatGPT-generated replies.

The study includes a closed Turing test-inspired computerized survey of all Steno Diabetes Center Aarhus (SDCA) workers (part-time or full-time). The poll included 10 multiple-choice-type queries with two types of answers, one authored by humans and the other produced by ChatGPT, besides questions on age, gender, and past contact with ChatGPT users. The participants had to recognize the ChatGPT-generated answer.

The pathophysiological processes, therapy, complications, physical activity, and food were all addressed in the ten questions. The 'Frequently Asked Questions' section of the Diabetes Association of Denmark's website, viewed on 10 January 2023, included eight questions. The researchers designed the remaining questions to correlate to particular lines on the 'Knowledge Center for Diabetes website and a report on physical activity and diabetes mellitus type 1.

Logistic regression modeling was performed for the analysis, and the odds ratios (ORs) were determined. The team evaluated the influence of participant characteristics on the outcome in the secondary analysis. Based on precise simulations, a non-inferiority margin of 55% was pre-defined and publicized as part of the research protocol before data collection began. In the case of human-written responses, they were directly pulled from materials or source websites from which the team identified the queries.

For practical reasons, two researchers, both health experts, trimmed a few responses to attain the desired word count. Before incorporating the questions, the context along with three samples (selected randomly from 13 pairs of questions and answers) were supplied to the AI-based language model in the prompts, with every question asked in the individual chat windows. Individuals were invited by e-mail, which included person-specific URLs that allowed them to complete the survey once. The information was gathered between January 23 and 27, 2023.

Results

Of the 311 invited persons, 183 completed the survey (59% response rate), with 70% (n=129) being female, 64% had heard of ChatGPT previously, 19% had used it, and 58% (n=107) had past interaction with diabetes patients as health practitioners. The AI-based language model was directed to provide 45-to-65-word answers to match human responses; however, the average word count was 70. However, consultation recommendations and the first three lines of the questions were removed, and the ChatGPT answers were considered to comprise 56 words (average).

Across the 10 questions, the proportion of correct responses ranged from 38% to 74%. Participants correctly identified ChatGPT-generated replies 60% of the time, which was over the non-inferiority threshold. Males and females had 64% and 58% chances of accurately recognizing the artificial intelligence-generated response, respectively. Individuals who had past contact with diabetes patients had a 61% chance of precisely answering the questions, compared to 57% for those who had no prior contact with diabetes patients.

Previous ChatGPT usage showed the most robust connection with the outcome (OR, 1.5) among participant characteristics. An odds ratio of comparable size was observed for the model in which age beyond 50 years was associated with a higher likelihood of correctly recognizing the artificial intelligence-generated response (OR, 1.3). Previous chatGPT users and non-users correctly answered 67% and 58% of the questions, respectively. In contrast to the initial premise, participants could discern between ChatGPT-generated and human-written replies better than tossing a fair coin.

Conclusion

Overall, the study serves as an initial exploration into the capabilities and limitations of ChatGPT in providing patient-centered guidance for chronic disease management, specifically diabetes. While ChatGPT demonstrated some potential for accurately answering frequently asked questions, issues around misinformation and the lack of nuanced, personalized advice were evident. As large language models increasingly intersect with healthcare, rigorous studies are essential to evaluate their safety, efficacy, and ethical considerations in patient care, emphasizing the need for robust regulatory frameworks and continuous oversight.

 
Journal reference:
  • Hulman A, Dollerup OL, Mortensen JF, Fenech ME, Norman K, Støvring H, et al. (2023) ChatGPT- versus human-generated answers to frequently asked questions about diabetes: A Turing test-inspired survey among employees of a Danish diabetes center. PLoS ONE 18(8): e0290773. DOI: https://doi.org/10.1371/journal.pone.0290773, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0290773

Posted in: Device / Technology News | Medical Condition News | Disease/Infection News | Healthcare News

Tags: Artificial Intelligence, Chronic, Chronic Disease, Consultation, Diabetes, Diabetes Mellitus, Efficacy, Food, Healthcare, Language, Physical Activity, Research

Comments (0)

Written by

Pooja Toshniwal Paharia

Dr. based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.