What Amazon can Teach Medical Affairs About Insights Mining

Speaker: Mark Michalski

Worldwide Head, Healthcare & Life Sciences, Artificial Intelligence and Machine Learning with Amazon Web Services

Speaker: Pete Piliero

MAPS Board Member, VP of Medical Affairs with Melinta Therapeutics

The pharmaceutical and MedTech industries, and specifically Medical Affairs knows how far we are behind big tech in our ability to mine data for meaning. Here MAPS speaks with Mark Michalski, former radiologist and currently Worldwide Head, Healthcare & Life Sciences, Artificial Intelligence and Machine Learning, Amazon Web Services about how we can catch up — from predictive analytics in the .com experience to graph databases and other ways to structure data from the cloud.

Following is an automated transcription provided by otter.ai. Please excuse inaccuracies.

Garth Sundem 00:01

Welcome to this episode of the Medical Affairs Professional Society podcast series: “Elevate”. I’m your host Garth Sundem, Communications Director at MAPS. And today we’re speaking with Pete Piliero, VP of Medical Affairs with Melinta Therapeutics and a MAPS Board Member, and Mark Michalski, Worldwide Head, Healthcare and Life Sciences, Artificial Intelligence and Machine Learning with Amazon Web Services, which is a mouthful Mark. Basically, though, we in Medical Affairs, we keep hearing how far we are behind big tech in our ability to mine data from meaning. And today, we’re gonna be asking Mark what Amazon knows that we don’t. But first, let’s set the stage. Pete, could you start us out? By telling us what we mean by insights? And what are these challenges in finding insights in big data?

Pete Piliero 01:00

Sure, Garth. Thanks. It’s a pleasure to be doing this episode. So insights, there is a no strict definition for what insights are. But I would say it’s one of the topics that we in Medical Affairs, talk about an awful lot, particularly over the last few years. So I’ll take a stab at what might be a way to summarize what an insight is, and then talk about the different sources, which then pose the challenge that we have. So you know, we talk about in Medical Affairs scientific insights, and this is information or viewpoints that may inform strategic decision making. And that’s a pretty broad base to start from. But we then try to boil it down to a Payne’s viewpoints on emerging healthcare trends, or a scientific statement that somebody makes that’s new or innovative, even a Kol perspective on a topic where there’s debate or disagreement. But ideally, we’re trying to find amongst all of those these unique perspectives, these pearls are diamonds in the rough, that may then have an impact on our Medical Affairs strategy. But even sometimes, these scientific insights that come into us through a variety of sources, which I’ll talk about in a second, may influence other functions within our ecosystem in the industry, for example, marketing. So here’s the challenge. Those insights can only be found. If you have a lot of data points, and we’re in Medical Affairs, do we get our data points? So I’d say the one that’s probably most common is through MSL engagements with HCPs. This is an important arm of Medical Affairs, small pharma, Big Pharma. And between Pharma. We all have MSLs. Especially once we’ve commercialized products, and they’re out there meeting with HCPs. They’re learning from them. They’re teaching them. And in the course of those discussions in those peer peer discussions, there’s a lot that can transpire. So we collect that information in our customer relationship management tools, and it’s a lot of unstructured data. So that’s the biggest source, I would say. But there’s a couple other sources. One is medical information inquiries. Again, we all have these departments in Medical Affairs, they’re taking in a lot of questions largely from HCPs, but sometimes from consumers, and again, on structures, just the other questions that they’re asking us. And then there’s three other ones that are worth mentioning. scientific literature, there’s so much literature being published today, if you just think about COVID, and two and a half years, the amount of publications, it’s enormous. And so for each of our companies in each of our therapeutic areas that we work in, there’s a lot of publications. Second is more of a newer concept in medical. But that’s really what we call these data lakes. And this is where we aggregate all of this unstructured data often that exists within our company, sometimes not just in Medical Affairs could be in clinical development. But this lake of information is another big place these insights could come from. And the newest place I would say that we think about more and more now in medical is social media. Twitter, for example, a lot of HCPs have a social media presence. And so how do we ingest that information? And then how do we try to find the insights so lots of sources I probably left out a few but the bottom line from all these verses is we have a lot of information, mostly unstructured in the form of text. And we want to bring insights back to the organization. But how do we do that? And how can we do that in a more efficient, effective manner? And that’s why I’m really excited to hear what Mark has to say,

Garth Sundem 05:19

Well, Pete, it’s funny. We’re speaking to a Medical Affairs audience. And I’m sure we’re speaking to people in the audience that specialize in insights. But I will bet you that 99.9% of the people listening, couldn’t articulate insights, as well as that and needed us, maybe you’d have put a peg in that. So thanks for the insights background. Now, Mark, there are about 1,000,000,006 things that I want to ask you. But let’s start broadly, how can Amazon save us from the data?

Mark Michalski 05:53

Well, well, let’s let’s get that one this way. First of all, thanks very much, Garth, Peter for having me. Really a pleasure. And I understand the gravity of the challenge. I’m a I’m trained as a radiologist. So I realized how complex medical data can, can can be to work with. So you know, the good news is that you’re not alone. In healthcare, and life sciences, we have tremendous amount of data that we have to have to parse in our in our industries. But the good news is, we see similar problems in places like manufacturing, media and entertainment, consumer packaged goods, and lots of other industries. And even in amazon.com, we see similar problems, I wouldn’t say that they’re directly analogous, but some similar challenges. Probably first, I’ll just set the table a little bit and try to explain Amazon, Amazon’s a fairly large company. And amazon.com is how we’re probably best known where you go and can can buy products online and have them delivered. But the Amazon Web Services part of the business is actually more of the business to business side. And just so that everyone in the audience knows it’s, it’s effectively a set of services, compute services, storage, and databases, that powers a lot of what you see on the internet, and, and a lot of the software that you use today that we actually all rely on. So there’s quite a bit that quite a few problems that Amazon has to tackle, both to provide that, that great customer facing experience, but also to maintain the services and the web services that we do every day for our customers. So so that maybe sets the table a little bit. The question about how do we start parsing some of this data. So let’s take one example from.com. So if you’re going online, and you’re looking to buy something, what do we all do? Well, we start like looking around, so if I want to, I don’t know, buy a pressure cooker, I’m gonna look at the different options on.com. And then I’m gonna go through the star ratings, and I’m gonna go through some of the those customer reviews. And it turns out that we’ve got millions of products online. And for us to understand not only which products are best for our customers are most useful or are driving the most joy for them, we also need to make sure that things like customer reviews don’t include really nasty stuff. And we have to be able to parse that data and make it safe and consumable, as well. So there’s actually a really analogous problem there, you have to be able to label some of those reviews as, as, you know, good, bad, maybe even unsafe. And then you have to extract features from those reviews that that things like sentiment, like, the customer feels really good about this product and uses words that make us that that give us that indication, or a string of words that make them make us feel like they did, they’re not so happy with that pressure cooker. So we do that on an everyday basis on on the site. And that’s just one example, literally, literally many 1000s of examples that are very analogous, both in.com and with some of the customers that we help,

Garth Sundem 09:47

well, let me jump in. So it sounds like there were a couple sort of necessaries there. One is cleaning your data or somehow you said making it safe I imagine that that would be kind of like applying tags to data generally right? And one of them would be unsafe or or, or somehow somehow cleaning your your data or labeling it or organizing it. And the end goal, though, are we talking, the end goal is prediction? Is the end end goal? predicting what someone will want? Or take us through that? Again? What what is what are the pieces of the challenges? In that.com? Looking for a pressure cooker experience? What do you need to do?

Mark Michalski 10:38

Yeah, no, absolutely. I love the I love the way that you’re starting to break it down, I kind of think about it similarly, as sort of a process first, you have to have the text in a in a parsable. Fashion. Okay, cool, then you need to be able to recognize either single words or, you know, sentences that mean a certain thing. And you need to extract from that, like a tag exactly, like you said, and natural language processing is the the field that does that. So it’s, it’s the process of taking some of that textual data and elevating out of it. Some of that meant what we call metadata, some of these tags that, that give us an essence of what’s being said, it’s, you know, if I, if I’m always looking for that pressure cooker, it’s got, I want to extract, is this a good pressure cooker? Is it bad? Did it fit? If it failed? How did it fail? What’s the I don’t know? What’s the size of the of the pressure cooker? What’s good I put in there? What will it handle? It doesn’t have settings, can I make yogurt in it, etc. So I think that the those are the kinds of things that you need to extract out of that text. You know, some of the challenges that your MSLs run into, when they’re taking copious notes when they’re working with providers, they need to extract out of that what kind of provider was this? What did the provider feel good about the interaction? Or about the drug? Did they feel bad? What were some of the AES? You know, how are those AES related? And those are the kinds of things that I understand that you’re you have to that your MSLs have to extract? And it really isn’t NLP very analogous process.

Garth Sundem 12:33

Okay, so I was I was taking some notes here. So you got to get the text in a possible fashion? Yeah, then you got to recognize meaning, which is extracting and applying a tag, if you want to call it that, usually, or maybe through NLP, that takes the textual data and defies the essence of what’s being said. So then what do we, you know, what do we do with this? And maybe, if we could bring you back in what, what does Medical Affairs want to do with this? And I’d be interested to hear also what what Amazon wants to do with this my elective, but I’m sure there’s a lot more…

Pete Piliero 13:10

No, that’s great great, because I was gonna come in, because I think that’s exactly the kind of the tie in here. So, you know, we in medical are looking for what we call actionable insights. So it’s not just the insight, you know, that new learning, if you will, but it’s okay, now, is this something we can actually do something about? Meaning? Should we do a publication? Should we do a study? Should we do a medical education program? So we are ultimately looking not just for the insights, but an insight you can act upon? Okay. So that that was why I wanted to jump in to point that out, but also then say some more. Mark, so then can you tie us back to.com? And help us understand how that those analytics that you’re applying the NLP that you’re applying to the ratings, for example? How does what is Amazon looking for? Is it insights that they can action or something else?

Mark Michalski 14:08

Absolutely. So if you think about we’ve now got that metadata, we’ve gone through the process of extracting some of those tags, exactly to your point. Now, what do you do with it? Well, there’s lots of things that you could potentially do with it, you can either you can sort of filter along those tags, and kind of say, I want to see all the examples where a certain concept got elevated. But oftentimes, you want to do something even maybe more sophisticated with it. You actually want to not just understand what what say what reviews are labeled a certain way, but also you want to understand how those those labels relate to one another. So if say you’re looking for pressure cooker at.com You might also be looking for other stuff, you might be also looking for other kitchen devices, you might be looking for blender. And that implies that somewhere we’ve encoded, that there’s this concept of, of devices that you would use in the kitchen, right? And that concept linking is exactly the same kind of concept linking that you want to see and say something like MS in medical concepts and complex medical concepts. How is Crohn’s disease linked to ulcerative colitis? How are different biologic? How is CO centric cosentyx? Similar to duplex sent, you know, these kinds of concepts, you need to store them somewhere. And so increasingly we look at, there’s a couple of different technical solutions for that. We, we have date graph databases, for example, which not only store some of that metadata, but also stores how are these concepts linked? And so that’s, that’s a technical instantiation of a concept that we sort of all tend to understand that implicit level like, these different concepts are linked. And that linkage is just as important as those concepts themselves.

Garth Sundem 16:27

Cool. So maybe, Pete, let’s dig into that a little bit. So we we’ve got these tags. And I again, Mark, I’m sorry, if tag isn’t the word that you would use for it, but metadata or whatever, we can filter along that at that point. So let’s just take a look at that one. Pete. I mean, in Medical Affairs, how would we use that?

Pete Piliero 16:49

Yeah, so and I do want to continue on that concept point as well. So, you know, at least the way we’re doing it in my company is, is that we’re asking the MSLs to identify potential insights. So it may have a couple of paragraphs on what they talked to the doctor about, but then I’m asking them, okay, out of that, what do you think are the potential insights? And that’s free text, but then I said, Okay, tag those insights, based on certain maybe, I guess, concepts, I wasn’t thinking of certain concepts, for example, clinical research or adverse event. And so, again, we now we have potential insights, linked to tags. So Mark, maybe you can help us with the tech now I’ve got that, that information. I love the idea of now trying to find the concept linkage. So to help me maybe understand how the graph database or another tool would help me to make that happen, because right now, for a lot of us in Medical Affairs, it’s a quite manual process. So you know, you can you can download all this information, and you can sort and filter it by tag, yeah, do a lot of reading and then try to while you’re reading link that link these concepts back together.

Mark Michalski 18:14

Yeah, well, that’s where my favorite part of this comes in machine learning. So when it when it comes down to it, you can’t label each one of these elements individually, right? It’s just so much work. But you do have some labels. And when you can start to extrapolate from the labels that you already have, then you can rely more and more on things like machine learning to identify, you know how, now that you have a note, it’s using similar language to say another note, maybe maybe you take as metadata, that it’s the same MSL wrote these two, two notes, or three notes or what have you. And maybe now you can use some of that label to extract those existing labels to extrapolate using machine learning that we’re talking about the same medication, or we’re talking with doctors have the same, the same clinical specialty, you can start to see how if you envision this as a network, you know, if you with a bunch of nodes and lines that connect all these different concepts, if you start filling in the puzzle, you can use machine learning to fill in the rest of the Sudoku, write the rest of the game, use machine learning to start start filling in the gaps, those lines, those nodes, some of the some of the pieces that aren’t filled, and that removes some of the the burden of trying to label every single thing.

Pete Piliero 19:50

Yeah, it’s interesting because you brought in a couple other elements that I wasn’t even thinking about. So you know, I was thinking about that potential insight, the tag, but then all of a sudden there’s other I guess again, metadata, who’s the MSL? What product? Because we capture that? What were they talking about? Where’s the doctor located? Is it? Is it an MD? Is it a pharmacist? So you’re right, you see just how complex the information is that, you know, it really feels like you need the right tools to help you start to fill in the gaps or fill in the Sudoku.

Mark Michalski 20:29

Especially me, I’m not particularly good at it.

Pete Piliero 20:35

I’m with you there.

Garth Sundem 20:36

Mark, is it true that a Medical Affairs organization starting out with insights would need to do some of this stuff by hand starting out? Like they would have to create a taxonomy of these, you know, concepts, Pete called them? So clinical researcher adverse events, and you would be working with your data collection a little bit by hand at first. And and only then would you be able to release machine learning into it to start, you know, looking at what it can learn from an existing data set? Or can machine learning just start new could could we say, okay, machine learning, you know, go into go into our data lake and tell us the cool things that we should be driving our strategy with?

Mark Michalski 21:25

Yeah. So I’ll answer that question. And then I’m, I’ll follow up with an acknowledgement. So the first, the first, the answer is that you can use machine learning. And there’s that both of those regimes. The first is called supervised learning. The other is called unsupervised learning. Supervised Learning is mostly focused on unlabeled unstructured data and finding from that data de novo without labels, or limited labels, how these different data elements might cluster. So so so you can use machine learning, even when you don’t have a whole lot of labeled data. Now, now for the acknowledgement, I feel like I’ve thrown a lot of concepts out here, which like natural language processing, graph databases, supervised and unsupervised machine learning. And it’s at about this point where I start losing the audience. I start glossing over and things like that. So I want to make an acknowledgement, which is, this is a lot of this is a lot of technology that sometimes feels out of reach. And part of the reason I’m bringing that up is because you don’t need to be an expert in this space, to leverage some of the some of the capabilities. First, you know, the cloud is one of those things, that has a way of democratizing this technology. But then also say that, the second piece is, even if you don’t know how to do it today, you don’t know how to use machine learning today, you don’t know how to use graph databases, or whatever natural language processing today. If you use sort of the concepts that we’re using today, you recognize that the more the more thoughtful your labelled data is, the more thoughtful that you’ve structured your data, when you start to incorporate data elements that you think might be actually useful, going down the down the line, and in time, that data becomes more and more valuable, no matter what kind of sophisticated technology you throw at it. So that’s something I think to for your audience to keep in mind. You know, data is not all the same, and it’s not all valuable. In the same way. You know, we’ve all heard garbage in garbage out. And, you know, as as sophisticated as we’re getting in data science, that still is going to be true. So I think that’s one thing to keep in mind. I’ll, pause there.

Pete Piliero 24:17

Yeah, Mark. So that’s great. Actually, I think my take away from that a little bit of an aha moment for me is, it sounds like structuring unstructured data will be helpful. Does that sound right?

Mark Michalski 24:30

That’s That’s right. And and the more that you can create processes in your workflow, that just make that an easy thing to do, to create structure out of your unstructured data, the better off you are.

Pete Piliero 24:43

Yeah, and I think that’s an again, another good point, as I think about the conversations that I’ve had with myself, with others in Medical Affairs, around insights over the last say, three to five years where this idea has really exploded. We often start with no structure, we often don’t know where to start. But for I think for those of us who’ve gotten better at this, we’ve put process around it. And process maybe again, where do you capture it? How do you capture it? Using tags and things like that. So I think that’s another good takeaway for the audience is, you got to get started. So So get started. But once you get started, think about how you put a framework around how you capture the information. Because the better you capture it again, garbage in garbage out, the better you capture it, the more likely then you can apply the tools to it to mine and make sense of the data. I hope that makes sense.

Mark Michalski 25:44

Exactly. Yeah. I think you hit it spot on.

Garth Sundem 25:47

Even if we don’t know exactly which tools we’re going to use. Yeah, that’s right. And that’s waiting for 18 months from now. But you know, you’re creating the richness of that data looking to the future. All right. Let’s leave it there. Then for today. Oh, Pete, you got something this? Yeah,

Pete Piliero 26:06

I got one more. Last question for Mark, because I think this is important for the audience. So okay, I’ve, um, I’m Dr. So and so at a company X. And I’ve started collecting insights and potential insights. I’ve collected a lot of unstructured data. Let me say it that way. And I’ve got this stuff. Where do how do I get started with trying to apply a tool? Where do I, how do I get the ball rolling there? Because now I’ve got all this stuff. I’ve got process. But I still don’t know how to map mine it other than doing this manual labor?

Mark Michalski 26:40

Yeah. Yeah. Well, the good news is, is it’s a lot easier today than it was even a year ago or two years ago, to access the tools that you need to make analytic analytical decisions from the state and make this data actionable. And so what I would say is, there are low code or no code solutions that you can engage, we have some at AWS, there’s a number of vendors that you can engage and partners that we have. And I’d probably say even if you don’t feel comfortable with with either of those, I would probably say that just dip your toe, and it’s actually not not as scary as it may seem. There’s there’s a lot that you can do with that data with with some simple tools. And there’s a lot of resources, learning resources out there that can help you just start to understand how to how to make progress in the space. So. So I encourage everyone who’s listening to this to just do a Google search. Take a look at what’s out there. Maybe watch some YouTube videos or go on the AWS website and look at some of the tools that we have for low code, no code solutions. That’s a that’s a great point to start.

Pete Piliero 28:09

That’s really helpful, Mark. Thank you.

Garth Sundem 28:11

All right. Well, let’s leave it there for today. Pete, Mark, this is so useful. MAPS Members, don’t forget to subscribe. And we hope you enjoyed this episode of the Medical Affairs Professional Society podcast series: “Elevate”.

Speaker: Mark Michalski

Speaker: Pete Piliero

Following is an automated transcription provided by otter.ai. Please excuse inaccuracies.

Connect with Us

Follow Us