By now, most of us have read many articles predicting that the arrival of generative artificial intelligence (AI) like OpenAI’s ChatGPT (Generative Pre-trained Transformer) and Google’s Bard. When chat capabilities are paired with a large language model (LLM) like this, they are called LaMDA or Language Model for Dialogue Applications. (1). Microsoft’s Bing search engine and chat is driven by ChatGPT. Other services like nat.dev are packaging AI services into pay-as-you-go engines for trying these tools. Moodle, the learning management system used by VetMedAcademy, has a plugin allowing connection to ChatGPT via an API connection (database to database) that can connect a paid account with openAI.com.
I, as many, have dabbled with using ChatGPT 3.5 out of curiosity as to what it would provide for a book critique, for example. I was struck by its “ability” (just predicting the next most logical set of words) to generate readable and grammatically perfect text. The output appeared logical, but, for a less mainstream book, it might make up (“hallucinate”) characters or information about existing characters in the book.
Concern Among Educators: The Peril?
Nonetheless, the ability just described has led to great concern among educators that students will use generative AI tools to write essays. Unfortunately, the concern over cheating has led to calls for returning all student evaluations to in-class proctored experiences. Sharples (2022) notes:
With more than a hint of irony, a leading AI in Education researcher concluded that GPT-3 will “democratize cheating,” putting out of business the expensive essay mills used by an estimated 15% of higher education students who use them. (2)
We’ll discuss the impact of generative AI on “open-book,” “open-computer,” or “take-home” examinations in veterinary medical education some other time, but suffice to say if the AAVMC’s Competency-Based Veterinary Education (CBVE)’s recommendations for Entrustable Professional Activities (EPAs) is to remain true to “Day 1” skills of veterinarians, it will need to acknowledge and incorporate appropriate use of generative AI by graduates. (3)
As opposed to “Peril” the title suggests, I prefer to think of technology as a continuum that instructors need to adapt their teaching and evaluation. I distinctly recall veterinary students in the early 1990’s claiming that they “didn’t do computers” when we had computer simulation laboratories on the topic of pharmacokinetics. OK, I also had students say they “didn’t do math” either! Thirty years later, I tend to look to how the latest generation of students adopts technology as part of their routine. From these observations, one can leverage tools (today, it’s smartphones) that they will inevitably have in their pockets. Quoting from the recent review by Tzirides et al. (2023):
We live and work in knowledge environments where we have come to rely increasingly on externalized memory in the form of the web-connected devices close to our bodies that we constantly need to look up—doctors and lawyers are good examples. Integral to their work nowadays, they do exactly what GPTs do—they need to be able to look things up. We rely increasingly on digital devices as our cognitive prostheses, not only to remember things but to process knowledge in-the-hand with algorithms of calculation and procedure. Today, more than ever before, we need to transfer the center of our focus in education away from long-term memory and towards higher order critical, creative and design thinking that effectively applies long-term social memory to social practice. (1)
Generative AI Performance on Standardized Tests for Physicians/Medical Students
ChatGPT has made a large iteration in output quality in its movement from ChatGPT 3.5 to 4.0. As we’ll point out below in some experiments, the “knowledge” accountability of ChatGPT improved dramatically. In a presentation “AI and ChatGPT in Health Professions Education” by Professor Phil Newton, PhD, Professor at Swansea University Medical School, he presents shocking data that he and a colleague had gathered bearing significantly on the content “accuracy” of the latest available generation of ChatGPT (4.0). The take-home message is that ChatGPT(4.0) has the capability of providing expert medical content in an appropriate context, and, focusing on the USMLE 1 examination given to end-of-second-year medical students, the scores were 27 points higher than with ChatGPT(3.0). Medical licensing examinations in Japan also were passed by ChatGPT(4.0). More surprisingly, where evaluated, ChatGPT(4) passed all of the clinical specialty board exams. (4).
ChatGPT(3) vs. ChatGPT(4) Ability to Pass Entry, Licensing and Specialty Board Exams (4)
The Current Limitations of Generative AI
Despite these “successes” from a content standpoint, critical minds need to understand both the source of these results as well as the “thought process” underlying them. Tzirides et al. go on to note these issues with generative AI with regards to the fundamentals of educational processes.
- Sourcing: The machine buries its sources.
- Facts: The machine can have no notion of empirical truth.
- Theory: The machine can have no conception of a theoretical frame or disciplinary practice.
- Ethics: If the machine is socially well mannered, it is not because it sources are necessarily that.
- Critical Dialogue: To appear a good interlocutor, the machine is skewed towards being uncritically affirmative. (1)
Each of the processes are crucial to education. So, if LaMDAs do poorly at these, are they not living up to the educational hype? Currently, without “supervision,” the answer would be “No.” However, there is the potential to improve outputs if the LaMDA can be provided a “source of truth” to serve as a first-line reference for facts and/or terminology. Content guidance, or, as we will now discuss, detailed rubric guidance, may improve educational outcomes significantly.
What about Generative AI as a Mentor or Guide?
AI has great potential to extend the mentoring reach of instructors. Khan Academy was on the forefront of recognizing the mentoring potential in its creation of “Khanmigo” which is a tool on its educational site that can be used to address problems or questions by the student (mentoring and providing no answers) and teacher mode (suggesting ways to teach the subject and providing the answers): https://www.khanacademy.org/khan-labs . NPR even interviewed Salman Khan about his perspective of AI as a mentoring tool. (5)
This potential has been explored by the team of educationalists working with Cope and Kalantzis on CGScholar (https://cgscholar.com). Applying both human peer and AI (ChatGPT(3.5))-generated reviews of student writing projects in their education courses, they have made some interesting observations. Human review scores were higher than those conducted by AI, and were perceived to be more “accurate”, but had lower variation. Evaluation of “sentiment” scores and the volume of feedback provided in response to the machine prompt structured around the peer review rubric, suggest that the AI reviews provide more detailed and informative feedback. As a result, they suggest that the AI reviews complement the human reviews to provide a more comprehensive evaluation (1). We are currently evaluating the potential for AI Reviews (ChatGPT 3.5) conducted before, after, and at the same time as peer reviews in veterinary student case analyses. Our preliminary conclusions are that ChatGPT 4.0, with or without provision to the LLM model of a reliable “source of truth” relevant to veterinary medicine, will be necessary to achieve acceptable results for instructional purposes.
Previewing the content issues one can face with AI reviews, I thought I’d show you an experiment I have been trying as ChatGPT has evolved over the past year with regards to a veterinary medical question. One can imagine that the human medical literature (on the web?) was “scraped” or “ingested” by the large language model (LLM), allowing it to perform as described above on standardized human medical examinations. So, as a veterinary clinical pharmacologist, I thought I’d ask a question that probed ChatGPT 3.5 and 4.0’s responses to: 1) a question probing for a numerical fact in veterinary medicine, 2) ability to convert units and do math, and 3) ability to provide 3 quality resources supporting its answer to #1. While we are told that ChatGPT doesn’t reveal its sources, in the trial below, it seems to try. I asked it to answer each version of the question at 2 different times, but only asked for sources the second time.
Question Prompt: What is the maximum safe intravenous fluid administration rate (in milliliters/kilogram/minute) to a dog? For the second trial only, the following phrase was added to the prompt: “..and list 3 literature references supporting this recommendation.”
ChatGPT 3.5 Response 1
I'm not a veterinarian, but I can provide some general information on the topic. The maximum safe intravenous (IV) fluid administration rate for a dog can vary depending on several factors, including the dog's size, age, health status, and the specific clinical situation. It's essential to consult with a veterinarian for specific guidance tailored to your dog's needs. Intravenous fluid therapy in dogs typically aims to replace lost fluids, maintain hydration, and correct electrolyte imbalances. The rate of fluid administration is usually determined by the veterinarian based on factors like the dog's weight, hydration status, and the underlying condition. A common guideline is to administer fluids at a rate of 3-5 ml/kg/hour for maintenance fluids. However, in some emergency situations, higher rates may be required. It's crucial to work with a veterinarian who can evaluate your dog's specific needs and adjust the IV fluid administration rate accordingly. Incorrect fluid administration can lead to complications, so always consult with a professional to ensure your dog's well-being.
ChatGPT 3.5 Response 2
The maximum safe intravenous fluid administration rate for a dog is generally recommended to be 90 mL/kg/minute. Please note that these references provide general guidelines, and the specific fluid administration rate for a dog should be determined by a veterinarian based on the individual dog.
Here are three literature references supporting this recommendation:
1. Epstein, M. (2007). The maximum rate of administration of intravenous fluids. Anesthesia & Analgesia, 105(2), 335-337. doi: 10.1213/01.ane.0000268115.73945.1a
2. Mathews, K. A., & Baskin, S. I. (2014). Chapter 8 - Fluid Therapy. In K. A. Mathews & S. I. Baskin (Eds.), Handbook of Veterinary Pharmacology (2nd ed., pp. 135-154). John Wiley & Sons. doi: 10.1002/9781118881131.ch8
3. Plumb, D. C. (2015). Fluid Therapy. In D. C. Plumb (Ed.), Plumb's Veterinary Drug Handbook (8th ed., pp. 5-14). Wiley-Blackwell.
ChatGPT 4.0 Response 1
The administration rate can vary depending on the health status and size of the dog, but generally, vets may use a rate of 90 ml/kg/hr for emergency fluid therapy. However, this is not a minute rate. To convert this to ml/kg/min, you would divide by 60 (minutes in an hour), which gives approximately 1.5 ml/kg/min. This should be administered under the careful supervision of a veterinarian.
ChatGPT 4.0 Response 2
The maximum safe intravenous fluid administration rate for a dog is typically considered to be around 90 ml/kg/hour, which is approximately 0.015 ml/kg/minute. However, the exact rate can vary depending on the individual dog's health status and the type of fluid being administered.
Here are three literature references supporting this recommendation:
1. DiBartola, S. P. (2011). Fluid, Electrolyte, and Acid-Base Disorders in Small Animal Practice. Elsevier Health Sciences.
2. Davis, H., Jensen, T., Johnson, A., Knowles, P., Meyer, R., Rucinsky, R., & Shafford, H. (2013). 2013 AAHA/AAFP fluid therapy guidelines for dogs and cats. Journal of the American Animal Hospital Association, 49(3), 149-159.
3. Rudloff, E. (2008). Emergency vascular access in small animals. Clinical Techniques in Small Animal Practice, 23(4), 128-134.
As a point of interest, all queries together cost < $0.03 on https://nat.dev
Observations about the “Experiment”
Firstly, hopefully it demonstrates that no 2 generative AI responses will be identical despite the restricted nature of the question prompt. Secondly, ChatGPT 3.5 was clearly programmed to include some disclaimer about its responses on medical issues, and it even presumes that the person asking is not a veterinarian or veterinary student. Its first response provides a lot of generalities about fluid therapy in a dog, but lists the maintenance fluid rate and does not answer the question of the maximum recommended rate, only suggesting that it would be higher than the maintenance rate. ChatGPT 3.5’s second response gets the number but not the units correct as it notes an impossible-to-administer 90 ml/kg/min rather than 90 ml/kg/hr, or with correct units 1.5 ml/kg/min. The references seem reasonable, but the first one is not veterinary, and the last 2 are veterinary textbook chapter and formulary monograph, respectively.
Turning to ChatGPT 4.0’s performance, Response 1 is essentially perfect, including the units and the mathematics. However, given a second chance in Response 2, it gets the hourly rate correct, but somehow divides 90 by 60 to get 0.015 ml/kg/min, a rate that won’t support any animal in an emergency, even if we had an infusion pump that could administer it. The references are different than those of ChatGPT 3.5, all veterinary and include 2 journal review articles and one highly relevant textbook chapter.
So, we are still at the “User Beware” stage of using generative AI in veterinary medicine. If those querying ChatGPT are aware of the continued potential for mistakes in content and even in calculations, these new tools can be added to our armamentarium. As always, computers can be excellent at keeping track of content, and generative AI has performed well on structured multiple choice examinations. For that matter, “vetting” responses of a LaMDA becomes a potential practical educational learning objective. In considering this status, the following flow chart, borrowed from the Lecturio.com faculty blog, is well-considered by those in veterinary medicine as well.
Source: Reference 6
In the new era of generative AI, the quality and depth of our evaluation of a student’s understanding and ability to apply complex information needs to ramp up. It is likely that the first educational benefits of LaMDA will come from rubric-driven supervised and unsupervised “mentoring” of students as they endeavor to write or solve complex medical problems. We all know that instructors need the extra assistance to evaluate their students’ work, and that students prefer timely feedback. So, generative AI may be part of the answer to the “peril” some believe it has created. With the just-released features allowing it to "see, hear, and speak" as well as read, the potential benefits cannot be ignored.
1. Tzirides AO, Zapata G,Saini A, Searsmith D, Cope B, Kalantzis M, Castro V, Kourkoulou T, Jones J, da Silva RA, Whiting J, Polyxeni Kastania NP (2023): Generative AI: Implications and Applications for Education,” arXiv, 2305.07605, doi: https://doi.org/10.48550/arXiv.2305.07605.
2. Sharples M (2022): Automated Essay Writing: An AIED Opinion. International Journal of Artificial Intelligence in Education 32: 1119-26. doi: https://doi.org/10.1007/s40593-022-00300-7
3. American Association of Veterinary Medical Colleges (AAVMC): Competency-Based Veterinary Education Entrustable Professional Activities: https://www.aavmc.org/resources/competency-based-veterinary-education-entrustable-professional-activities/
4. Newton P (2023): AI and ChatGPT in Health Professions Education; presentation for Lecturio. https://www.youtube.com/watch?v=leXO_3L01QA&t=1162s
5. National Public Radio: When AI is your personal tutor with Sal Khan of Khan Academy;How I Built This Podcast with Guy Raz: Episode 537, Jul7 27, 2023. https://wondery.com/shows/how-i-built-this/episode/10386-when-ai-is-your-personal-tutor-with-sal-khan-of-khan-academy/
6. Ratliff M, Sya’ban SN, Wazir A, Haidar H, Keeth S (2023): Using ChatGPT in Medical Education for Virtual Patient and Cases; Lecturio.com Faculty Blog, May 8, 2023. https://www.lecturio.com/inst/pulse/using-chatgpt-in-medical-education-for-virtual-patient-and-cases/