Connect with Us
602 Park Point Drive, Suite 225, Golden, CO 80401 – +1 303.495.2073
© 2024 Medical Affairs Professional Society (MAPS). All Rights Reserved Worldwide.
Digital First Medical Affairs: GenAI: The Importance of Prompt Engineering
In our last podcast episode of this series, we talked with Matt Lewis. We discussed generative AI, what it is and how it may change and evolve the medical communications environment. Today, we dive into a really important area that can significantly influence your GenAI results – Prompt Engineering and other areas.
MAPS 00:06
Welcome to this episode of the Medical Affairs Professional Society podcast, “Elevate”. The views expressed in this recording are those of the individuals and do not necessarily reflect on the opinions of MAPS, or the companies with which they are affiliated. This presentation is for informational purposes only, and is not intended as legal or regulatory advice. And now for today’s “Elevate” episode…
Jennifer Riggins 00:33
Welcome to elevate the Medical Affairs professional societies podcast. As a series within this podcast we focus on digital first communications how digital is transforming Medical Affairs. In these podcasts, we speak with experts in the field of Medical Affairs, and discuss how digital transformation is opening opportunities for Medical Affairs communicators. I’m Jennifer Riggins, a co host of this podcast I currently serve as a member of the Digital Focus Area Working Group. I’ve worked with pharma for more than 30 years with a focus on medical information, scientific communications and medical digital. I currently work for phactMI, a nonprofit consortium of medical information leaders. I’m joined by my co hosts Steve Casey of Omni healthcare communications are on the HC for short. Steve has been in pharma for over 35 years and has led Omni HC for the last nine years to become a leader in digital first medical communications. In our last episode, we talked with Matt Lewis, we discussed generative AI what it is and how it may change and evolve the medical communications environment. Today, we want to delve into a really important area that can significantly influence your gen AI results, prompt engineering and other areas.
Steve Casey 01:48
You know, Jen, some of the audience may be familiar with the term prompt engineering. Well, probably a lot of them are not familiar with that. So we’re here to break it down for them today. And joining us today is Jenny Ghith, Generative AI Content Lead at Pfizer. Thank you so much for joining us today, Jenny. It’s great to have you back on the podcast. But for our new members, our new audience members, could you give us a brief overview of your background and how you were working with AI and Gen AI.
Jenny Ghith 02:17
Thank you, Steve. And it’s a great pleasure to speak with you and Jen again, I’m humbled and grateful. I have a science background. I have an MS in immunology, and I’m wrapping up my MBA this year. And I’ve been in industry and SCI cons for over 15 years, or that’s all I’m going to admit to before you start to date me too much. I sit on the Board of Trustees for ismat, the International Society of publication professionals, and I’m part of the digital strategy focused area working group format. But you asked me about my experience in in AI and I’ve been working in the field of AI for the better part of about six years now with large language models and small ones even before chat GPT and started off using it to extract information from the literature and datasets. We wanted to tame information overload to help audiences find the data they needed. In a world where there is just so much available to us. I’m not a data scientist, but I collaborate with them regularly. And clinicians and a major part of what we do is bridging dialogue across the disciplines. Because that’s really key to success with AI and generative AI in particular.
Jennifer Riggins 03:31
Awesome. So thanks, Jenny for providing your background and giving us that glimpse into some of the work that you’re doing. I think I first heard this term prompt engineering in one of our MAPS digital FAWG meetings. So can you tell us what is prompt engineering? And why is it so important in generative AI and large language models?
Jenny Ghith 03:50
Absolutely. So So prompting is, it’s just another way for us to describe a starting point for creating content with generative. And so it includes questions that we ask of the models or instructions. And some of the necessary context behind the ask as well. Ideally, the engineering part is how we work with the system to design it and get the answers that that we ultimately will need. And it’s important for us to think about the fact that to be good at prompt engineering. It doesn’t require that we have coding experience, we don’t have to be developers. What it requires is the ability for us to think clearly about what questions we want to ask and how we want to instruct the systems and what’s most important to us as the end users. It’s really just a cue to the system.
Steve Casey 04:43
So let me see if I get that right and boil it down a little bit prompt engineering is essentially carefully crafting the choice of words i i use the prompt that AI engine into providing my desired output. What are some of the different types of prompts and how do you how do we know when to use each one?
Jenny Ghith 05:01
Yeah, so there are indeed different types of prompts and a bit of jargon associated with them as well. So you’ll hear about direct prompting, or zero shot, prompting, this is really the simplest type of prompt, it’s really a question or an instruction. When you get to the point where you’re talking about multi-shot prompts, that’s where we ask other questions afterwards, right? Where it’s more of conversation. There are other types of prompts. I like the role prompting a lot myself and use it probably daily. This is where you tell the system speak to me as a medical writer, or as a pharmaceutical executive. And you ask the model to respond to you with language that reflects that point of view. It’s very useful from a creative perspective to if you need them to do something where you’re asking them to be a digital artist or a social media expert, for example. The other thing that you can do is use examples to prompt the systems. So this could be giving snippets of texts that are really the instruction guide for the AI, write me something like this, essentially. And you can also ask for lists of ideas that look like the example that you gave, again, very useful when you’re writing kind of more bite sized content. There’s chain of thought prompting, that’s where we break down tasks into individual or simpler tasks. So you’re, again, this isn’t necessarily a one and done q&a system, you’re going to ask the AI perhaps to think step by step. Or you can do it individually, and get a sense for the reasoning and the process through which you’re doing the prompts as well. What I what I ended up doing a lot, too, is is rewriting the prompts themselves, so it is iterative. You want to try to test different ways to get better responses. And it’s really helpful to track what works best for you and your company. And then, of course, there are tools on the back end, you might have heard about things like rag retrieval, augmented generation, these are all just supplemental things to the models themselves, that augment their success, if you will, and augment their abilities to find relevant information. There are ways to work with your development teams and AI experts to increase the accuracy of the model on the back end, in addition to the prompts that you use, and you can even have more detailed prompts programmed into the API behind the system, so that your users don’t necessarily have to ultimately recreate the detailed prompts that you’ve learned from experience work better in certain situations. So there are a lot of different things that you can do to help.
Steve Casey 07:54
So there really is a science to it that will impact the results you receive from the Gen AI. What are some of the common pitfalls to avoid when designing those prompts?
Jenny Ghith 08:03
All right, I think the don’ts are definitely as important as the do’s. So I’m really glad you asked. Um, these are language models, not knowledge models. So what they’re doing is they’re matching their training data set. You’re the subject matter expert here I’ve had people think about it is if you have a very helpful assistant, or a fellow that you’re working with, or a more junior Medical Affairs colleague, perhaps so so what what don’t you want to do, right, you don’t want to overestimate the capabilities of the system. You want to avoid jargon, we have more jargon than we even realize. For example, recently, we were we were thinking about getting the results written up from specific pieces. When you think of results, you think of data that should be present in the results section. But when you say results to an AI model, how it thinks about results depend depends on how it’s trained. And it might not necessarily understand the result, the way that we may think it does, it may not actually report specific data points when it gives you the results. So you need to tweak how you ask your questions and ask them in very plain language ways. You want to avoid vague prompts. Being overly complex, the AI can lose track of a conversation, just like a human deaths. So it becomes harder for it to judge what’s most relevant when you do get too lengthy. And you want to specify any constraints that you have around the prompts so that you can define your own expectation expectations further for the system. And if you have formatting requirements, if you want it to write bullets, you should ask it to do so. Beware of unconscious bias in your prompts if your prompt contains assumptions that In our bias, you’re gonna get a bias response and expecting success, you know, honest as a single or a zero shot effort. You know, I think that that’s, that may be aiming a bit high, particularly at first. So expect to have some iteration that you’re going to need to have a conversation with these systems. In fact, that’s really the value of it. We often ask leading questions that we know the answer to, we’re learning about something called cinco fancy these days. And that’s when AI tries to give you answers to meet your expectations. So if you ask leading questions, be prepared to get an answer where the AI is trying to meet your expectations. And just again, be cognizant of the limitations and the nuances behind the models that you work with. You don’t have to be a data science expert and know everything about all of these models, you can ask your colleagues if you need to, but think about the training set behind the models if it’s been trained on biomedical terms, or if it’s just trained on the internet that affects the responses that you get. And think about what you’re looking for is word count really important to you? Sometimes these systems don’t do so well with specific word counts. Some models may do better than others, manage your own expectations, and think about what’s good enough for you to proceed with what you get out of the systems.
Jennifer Riggins 11:24
Yeah, those are a lot of really great recommendations. Any thanks. You know, I recently completed a Coursera course on prompt engineering. It was written by a professor at Vanderbilt University. And as part of one of my homework assignments, I used what they called a persona pattern and asked chat, GPT and Gemini to act like a podcast host on prompt engineering and to create questions to ask an expert on prompt engineering. You know, I thought it was pretty cool. Here’s one that I think that I received back from from the AI that I think will be helpful to our audience. What are some tips for writing effective prompts for different tasks, such as generating text, or maybe translating languages and writing different kinds of creative content?
Jenny Ghith 12:12
I think it’s a really interesting question, I’d almost like to put it into an LLM and see what it told us and its response. But But I can give you my perspective, you know, the technology, the use cases are evolving. One thing I noticed that you did is what we talked about a minute ago, which is essentially roleplay. Right? Act like a podcast host, this is a great strategy. So there are similar strategies for some of those other types of content that you mentioned. for generating text, you can ask to, for it to reflect to the point of view of a medical writer or a technical writer. Basic principles apply here, again, be clear and specific in your apps, and use natural language. Think about your jargon, if you’re asking for a plain language summary of materials, which is something that has come up a lot and use cases that I work with. The model may not understand what plain language summary means the way that you do. But it is very good clarity. So instead of asking it in a jargony way, you might want to say write the response as if you were at a certain reading grade level, sort of fifth grade reading level, for example. And then you can adjust the response depending on how well it does and iterate. I think too, it’s I just really encourage teams to work through ways to optimize the models so that they cite where they’re pulling their answers from. It’s always helpful to look back, we have to fact check and do all those things with anything that we get with generated text, as Medical Affairs and medical communications professionals for translating manage your expectations. Think about if you if you really need the large language model to run your exercise. There are a lot of wonderful translation services out there. What does the literature tell you about the success of particular models with translations? How are you going to validate the translations that you get? You want to think about that upfront? How are you going to judge what good looks like when you’re prompting, think about the source of the language and be specific about that. Be specific about the source of the language you’re starting from, and the language that you want it to translate to. And then if there are any jargon, slang or regional considerations, idioms, clarify those in your prompts to avoid ambiguity. Because some of what is that can have multiple meetings across cultures and even within certain language groups. So clarify your intent as much as you can. Finally, creative writing, one of my favorite things to think about set the tone and the style and your prompt. Precision helps here too. Are you looking for something that happy and positive? Are you looking for something funny? Are you more sober or serious? Are you more academic in tow to bring it back to what we do? And are you looking for something that’s more modern and edgy? Are you looking for a classic response? And if we’re talking about image generation, which is something we’re gonna be talking about more in future from our prompts, is themes help? Are you looking for details like certain colors or textures? I think this is really a new frontier as we talk about things like Dolly, and Sora and multimodal technologies in particular as well.
Steve Casey 15:36
You know, I find this this area in this discussion, just really interesting. And you’ve mentioned something about, you know, taking what Jen had produced and putting it back in to get an answer. How do you how do you think prompt engineering is being used to really improve the performance of large language models such as chat GPT, Gemini or Co-Pilot?
Jenny Ghith 15:59
I think it’s interesting to see, I think the there’s a science and an art to it as well. Our responses to prompts tell us a lot about the strengths and the limitations of the models that we’re using. The prompts that we use help us to further identify training needs and troubleshoot problems that we’re going to see with specific audiences. And it guides the developers in how they fine tune the models and also help to improve their efficacy. Companies can learn from the results of the models and the prompts to further train their models, as I mentioned, but also to anticipate and benchmark against the use cases that they’re working on. I think what’s missing is more consistency, and validation of how we review the results of these models and how we think about our prompts. It’s it’s not enough to say that our responses are passing muster with the I call them the AI expert metrics that we get, you’ll hear about things like cosine row and blue, you’ll hear that responses past medical licensing exams, for example, as well. I think that we need to make sure we’re looping the subject matter experts in and we’re thinking about the domains that are going to be most relevant to them. That could be things like clarity, accuracy, relevance. There’s some interesting work on looking at doing no harm, and the rates of hallucination and the types of those hallucinations. I think we need to also look at how our results in these domains improve, with better prompting, and we need to measure that and we need to report that in the literature so that we have more data to rely on to help us improve our strategy.
Jennifer Riggins 17:47
You’ve touched on a few things already, Jenny around the areas that need to be improved. And we all know that there are always challenges that need to be addressed. What are the biggest challenges that you think need to be addressed in the field of prompt engineering? And how do we apply that to Medical Affairs?
Jenny Ghith 18:04
I think it’s it’s such an interesting time, we’re at this inflection point, and things are moving so quickly. We’re learning the developers are learning, it’s really important that we encourage cross disciplinary dialogue to help us that we understand the expectations that we have of these models with respect to, for example, adherence to medical guidelines, because they are in fact ingested into some models. And we know that when we do that it does help the models improve. There’s literature too, that AI can make recommendations that are incongruent with guidelines as well. And I think we need to understand the rates of occurrence of that, again, the rates of hallucinations, and how certain prompts result in the likelihood of certain types of responses. And we need to understand how to mitigate the ones that we are not looking for or that are harmful. There’s a lot of work in the general field itself overall, including the consumer of space, too, but we need more information in the medical domains. It’s not enough that we put up anecdotal experiences, we need to be driven by the data, including with the understanding of the prompting data as well. So we want to think about that more as a community. And we want to consider relevance. Sometimes when we use these prompts, the AI misses what we consider to be most relevant and important. And I think that it’s important that we understand how to get the AI to understand what’s important to us more appropriately, and to try to standardize that established best practices associated with that. So all of this will ultimately boil down to helping us trust the models more as a community. And that’s a big issue that we need to address. And think about more so that we can all get more comfortable with using the models and get them to be more optimal for our needs.
Steve Casey 20:10
You know, I’ve been thinking a lot about the ethical implications of Gen AI. And as we, as we use Gen AI and developing ethical guidelines for the use of prompt engineering, as well as various various other tools and techniques, won’t that help mitigate the risks of harmful or misleading content generation? What advice would you give to someone new to prompt engineering? And more importantly, how can they learn more?
Jenny Ghith 20:38
It’s a great question. We need to maintain our integrity as an industry. As professionals. We know trust is an issue as we just discussed, if it’s comforting at all, and keep in mind that the basics still apply. I think it’s reassuring. We need to check our facts and understand why we are asking certain questions as scientists and as professionals, and understand the systems that we’re working with at least the basics of the technology. We don’t have to be data scientists, we don’t have to be coders. And that’s really the beauty of these systems. But we do need a certain level of literacy and understanding of what’s under the hood, particularly with the data and the training sets that are ingested behind the models. And if you want to learn more, this is a great time to be in this field. It’s there’s so much information available to us. There’s a lot of attention on Jet AI right now. And the tech companies themselves are providing really great guides, basic information, there are even some really great free courses out there from institutions of excellence, you’re also going to find success looking at non traditional sources, a lot of the AI literature is in Preprints, which is frustrating a bit, but it also allows for fast publication of information. It also means that it can be hard to find. So we’re not used to looking through preprint materials, or at least I’m not. So it’s great to start to think about looking through reprint articles, understanding the limitations of those as well is really important. And dare I say it, there are great groups on social media as well. I’ve been using LinkedIn a lot, these groups are filled with AI experts, and you can use them and even if you just observe the hope you stay up to date. And that’s really important in the field. As always, you know, consider your source. Again, basic supply, check your facts, but there is really a ton out there for us to learn from the news.
Jennifer Riggins 22:43
Jenny, I know you’re a really creative person. So what creative ways are you using prompt engineering within AI in your personal and professional lives? What creative ideas can we leave with our audience.
Jenny Ghith 22:57
So I’m really excited about what’s happening with imaging, and video, and art. And I’ve been experimenting with that for my own personal use. I love the poetry and the jokes, even if it’s bad. So you can ask it to write a poem, you know, and share that with the team in a meeting. And it just brings a little bit of joy into our lives, of course, disclose everything that you do write that it’s all developed using Gen AI. And then I’m super excited about some of the basic things which which sounds, you know, less interesting, but it really is life changing. Like having being able to take pictures of my receipts from all my travel and having it read those I mean, that’s that’s going to change my life pretty quickly. And already, you can use it to summarize your meetings and your use. So you can get meeting minutes. This too, is just really helpful. And it can even summarize your emails. So that’s that’s very exciting.
Steve Casey 23:58
Jenny, I want to step back and go back to something that you mentioned earlier, you were talking about cinco fancy, and relation to Gen AI. Can you tell us more about sycophancy and Gen AI? And is there anything we can do about it and Medical Affairs?
Jenny Ghith 24:13
Yeah, I’ve been I’ve been reading about it a lot and learning a lot about it too. I think it’s really fascinating. It’s when these models align their responses to match our beliefs or expectations. So like, like a student that we’re working with, you know, they want to they want to get the right answer that you’re looking for. Right? But they don’t necessarily know what the correct answer is. The factually accurate answer is they just want to give you what you’re looking for. Right. So instead of prioritizing truth or accuracy, we do see that we do start to see this with some of the language models that are out there. This can lead to misinformation and the proliferation of misinformation. It inhibits trust. And there are ideas around how to address it. for training the models and enhancing their abilities in alignment with the facts in the training, and all this, again depends upon the training datasets that are used. So we still have a lot to learn. For Medical Affairs, again, when we use these models, understand that these are tool, they’re not magic bullets. They’re a powerful tool. And they’re going to help us with many of our activities and our deliverables. But we’re still ultimately responsible for their output. So we need to check all of our materials for accuracy, we also still need to track the occurrence of these types of issues, so that we can report back to the developers and to our cross functional partners. And they can help refine the models themselves and get us to where we need to be in terms of accuracy. And really also ensure that our colleagues are aware of the potential for these these sorts of issues as they’re learning themselves. We’re all really learning a lot really fast together. And it’s important that we have awareness.
Jennifer Riggins 26:02
The one final question, I think we would be remiss if we didn’t ask you this last question about hallucinations. So we can’t seem to discuss, deny without hearing about hallucinations. And you’ve even mentioned it a few times in some of your previous responses here. But can you give us some insight into the hallucinations, how you’re dealing with them, and how Medical Affairs can deal with them.
Jenny Ghith 26:25
So there’s some really great research happening around this topic. It’s one of the main issues that we’re dealing with as far as these models go. And in fact, we should expect them to happen. In very plain terms, hallucinations are when we receive nonsensical, inaccurate outputs, or the system makes up or distort specs, some folks in the community are calling them confabulations, you may hear that term a bit as well. The issue is that these models are so good at clarity, that it can become difficult to detect the error. Sometimes the obvious ones are easier, right. But just like when you listen to a really good speaker, or read information on social media posts, that is inaccurate, or that is made up, it’s difficult to detect, because that speaker or that post is so good. So I think we need to again, always consider the source, think about our basics and all of our training that we have. Because we need to keep all that in mind. There’s a spectrum of hallucinations, too. So they can be mild all the way to alarming. And another important consideration, as we talked about is do they do harm, particularly in the medical space, right. In addition to their severity, and how often they occur. Again, we should expect them to occur, it’s essential that we learn more about which models are prone to hallucinations, and even identify why that is the case. And define strategies for how to identify them with our teams and train them on how to do that. And ultimately, then mitigate them in the medical domain. Very specifically.
Jennifer Riggins 28:15
Well said. So Jenny, Steven, I want to thank you for providing your thoughts on prompt engineering. You know, it’s important in directing the behavior of these LLM ‘s and how we might learn more about prompt engineering and its application to the use of Gen AI models in Medical Affairs. We hope you’ve enjoyed our discussion on prompt engineering. If you enjoyed the podcast, please make sure to like it and feel free to comment to us, you can find our contact information on LinkedIn. Thanks for joining us today and listening to our podcast series Digital First Medical Affairs a podcast production at the Digital Focus Area Working Group of the Medical Affairs Professional Society.
Steve Casey 28:55
If you’re a MAPS member, thank you for your support. If you’re not yet a MAPS member, I want to encourage you to join so you can access additional resources. Visit the MAPS website today at MedicalAffairs.org/membership.
602 Park Point Drive, Suite 225, Golden, CO 80401 – +1 303.495.2073
© 2024 Medical Affairs Professional Society (MAPS). All Rights Reserved Worldwide.