
Top of Mind with Tambellini Group
Top of Mind with Tambellini Group
Harnessing Cloud Computing to Accelerate Scientific Research with Noora Siddiqui, Cloud Engineer, Baylor College of Medicine
Baylor College of Medicine’s Human Genome Sequencing Center is at the forefront of leveraging scientific research for the cloud, and self-taught Cloud Engineer Noora Siddiqui is largely responsible for leading this progress. Noora joins the Top of Mind family this month to share her insights on how cloud migration is revolutionizing precision medicine, including the ability to calculate individuals’ risk of disease through combining data from genomic information and medical records.
Welcome to the Tambellini Group's December Top of Mind Podcast. I'm your host, Liz Farrell. Today I am very excited to be joined by Noora Siddiqui, who is the first life scientist to join us as a podcast guest. Noora currently serves as a cloud engineer at Baylor University's College of Medicine, and is a self-described life scientist by education, cloud engineer by profession, and an educator by passion. Those descriptors are just the tip of the iceberg in describing what she has accomplished. Prior to joining Baylor as a programmer, Noora taught anatomy and physiology at San Bernardino Valley College. Prior to that, she earned her undergrad degree in biology and her MS and pharmaceutical sciences specializing in the use of high-performance computing to explore complex challenges in drug discovery. During this time, she also received a prestigious fellowship from the Department of Energy. She is a self-taught cloud enthusiast and an active speaker committed to education and enablement. At Baylor, she pioneered the design, development, and implementation of one of the first clinical genomics pipelines running at scale in the cloud. So today she's going to share some insights with us about her work and more. Welcome, Noora.
Speaker 2:Hi, Liz. Thank you for having me.
Speaker 1:Well, as I mentioned, we couldn't be happier to have you with us today. You definitely have a unique scientific background, which isn't something we commonly have on our podcast, but we will definitely be focusing on technology. So let's get right to it. We mentioned your background in science and you had come to Baylor College University, I'm sorry, Baylor University College of Medicine as a programmer. So, can you tell us about what the Center for Human Genome Sequencing does and your role there when you started back in 2019?
Speaker 2:Yeah, absolutely. And first off, I think I should make the distinction that it's actually Baylor College of Medicine, which is a little separate from Baylor University, but they share the same name. And I currently work at the Human Genome Sequencing Center(HGSC), and I think it would be remiss for me to not mention that we're located in the heart of Texas Medical Center, which is the largest medical center in the world. And so because of that, we've really been at the forefront of the field of genomics ever since our inception in the late nineties when we worked on the Human Genome Project.
Speaker 1:Okay.
Speaker 2:And I mean, since then we continue to innovate at the forefront of genetics, everything from developing a custom test to determine someone's risk for developing heart disease, to combining, electronic medical records with genomic information.
Speaker 1:That's really cool. Can you describe for us what precision medicine is? And I know you've worked on this NIH All of Us Project, so can you tell us about both of those?
Speaker 2:Absolutely. So I really like to say that healthcare is not one size fits all. And I think that statement immediately resonates with a lot of people because you know that the same medicine that works well in one person might not work as well in another, and then two people living in the same environment don't have the same risk for developing a certain disease. So just based off of that, a lot of our work then at the HGSC Clinical Lab works to this idea of individualized diagnostics and individualized treatments. And so it's no surprise that we were awarded a grant by the National Institutes of Health, known as all of us to contribute to the project known as all of us. And, this is a really ambitious project. I think it's one of the largest sequencing projects in the world, and it aims to collect healthy data from 1 million people in the United States and combine information about, you know, their DNA with information on their environment and lifestyle. And as a sequencing center, we're focused on the biology aspect of that and provide the whole genome sequencing for that project, which, you know, if you're talking about a million people, is really petabytes and petabytes of patient genomic information.
Speaker 1:Wow, petabytes, that is not a word I hear often. When the rubber meets the road, what is this project doing for someone? Like you mentioned, you know, everyone's their own unique snowflake, right? They've all got different ways they react to treatments or factors in their environment. How are you taking all this information to help patients? Or is it more at the 40,000 foot view?
Speaker 2:So, you know, I think the major aim of the National Institutes of Health in putting forth a project like this and having so many centers across the United States contributing to a project like this really is to create a very rich research repository for, you know, scientists and people from all different fields to be able to study and then come up with new treatments for, you know, people from diverse backgrounds. And I think all of us is again, also unique in that it acknowledges that a lot of research has been done on people from European ancestry. And so they aim to collect data from people of different ancestry as well as different lifestyles, socioeconomic, stratas, different environments as well. And I think that's a really, really unique aspect of this project in particular.
Speaker 1:Definitely. I mean, we hear so much about all these studies on the effects of different medications have been performed primarily on white males. So even with females, sometimes there's a big dearth of understanding on how things affect them. I mean, it sounds like at the end of the day, the ultimate goal here is connecting these two puzzle pieces where you can take, okay, well these are the treatments that we know for someone say who, who has been diagnosed with a specific type of cancer? And here's what we know about their genetic background. So making that treatment more tailored to the specific combination of those two.
Speaker 2:Yeah. And it all begins with, you know, the data and having that data from so many people and being able to put together puzzle pieces that will benefit more than just the million who contributed to the project.
Speaker 1:Wow. And it's a million people that are going to be ultimately involved in as data points, it sounds like. Yeah. That's really cool. Can you tell us, so you mentioned a lot of data that naturally lends itself to the cloud. You worked to develop this cloud based infrastructure for research. What were the problems that you were initially intending to solve by doing that? And and how did you go, let's actually, let me rephrase, let's back up a little bit. How did this even become to be a cloud-based project?
Speaker 2:Yeah, so that, that is a very loaded question. So, I guess it's important to understand that when you're dealing with petabytes of, you know, patient or participant genomic information, there are many problems. And one of those, or many challenges I'd like to say, and one of the challenges is the challenge of time. It's like you want to be able to return participant information to them and return participant information to clinicians and geneticists right away. So we don't work on just all of us. We work on a variety of different, you know, cancer projects and fetals genome sequencing projects, and you want to be able to return diagnostic information, very important diagnostic information that can guide treatment immediately to clinicians and doctors. And then on top of that, again, we're talking about multiple projects, not just all of us, you have the problem of storage, where are we going to securely store all of this genomic information and all of the analysis. And then you also talk about the challenge of scalability, which is how am I going to scale this and how am I going to apply the compute power that we have available and the resources that we have available to tackle this problem? So, just in the face of like the challenges of time, the challenges of storage and the challenges of scalability, We decided to kind of explore some of the offerings in Amazon web services because through a specific technology that we were doing a proof of concept on in the AWS cloud, we were able to reduce the time for our genomic analysis quite significantly from like 80 hours to two hours. And you know, it wasn't just that reduction of time when I was doing that proof of concept in the cloud, I realized, well, hey, you know, I can spin up a server in a matter of seconds in the cloud versus having to accurately predict and calculate how many servers to rack and stack in an on premise data center and then maybe not use them right away. So I realized that, okay, well, you know, migrating this project to the cloud would be able to address the problems of time, the problem of storage as well, and the problems of scalability. So that's kind of where it started. It was definitely like precipitated by all of us because it was such a huge project, which actually represented like a quadrupling of the capacity of our center just at like the drop of a hat. So, because we knew the scale of all of us was just momentous. We were like, okay, we have to, you know, come up with something new to address the challenges.
Speaker 1:It's fascinating that you say proof of concept because as you're describing, you know, the need for obviously in any sort of medicine or healthcare, the security and the sensitivity of data, it is so important. It seems like it would be a no brainer to have this all in the cloud, but it sounds to me like that wasn't something where when you were at this juncture of trying to solve this problem, you had a lot of examples of prototypes to use. I mean, can you describe for us a little bit relative for the whole higher ed research universe? Like how common is it to have this, why doesn't it happen more?
Speaker 2:Yeah, so, you know, this, we're talking about 2019 when we're talking about like the initial stages of exploring cloud for the All of Us project in particular. And I don't think there were many higher education institutions, or not the vast majority of higher education institutions doing a lot of their clinical projects or research projects of the, this scale in the cloud. And so, you know, I think there is this idea that I like to term mental inertia and it's this idea that, you know, we've always done things like this and so we're going to continue to do things like this because it's easy. And I think that's what unfortunately a lot of organizations fall into in terms of thinking pattern. But I think I'm very lucky to say that at the Human Genome Sequencing Center, I think we have a, a team that's very, very flexible, a team that's very, very open to innovation in all forms and rapidly changes, I think. So when I say we're at the forefront of genomics, I I very proudly say that it's not just about the technology, it's not just about the the things we do, it's also about the people and about their mindsets.
Speaker 1:Oh, that's fascinating to hear. Especially given, I mean, you think of all scientists as being innovative obviously in their field, but there can be a lot of hesitancy in starting on something so big and new. You mentioned being self-taught in this. Can you walk us through the idea of, you know, you've mentioned this problem have been recognized. It seemed like a lot of people at the center were on board with doing this, but it hadn't really been done before. I mean, where do you begin in something like that and what, what compelled you to take this on?
Speaker 2:Yeah, so, so in early 2019 I walked in as a programmer, a junior entry level programmer. And I was just, you know, working on that proof of concept of Illumina's Dragon Technology, which is like a sequencing platform and I just love solving puzzles. But I remember that the exact moment that for the proof of concept, one of the supervisors had given me like a password for an AWS account that was specifically for this proof of concept. And as soon as they walked away, I just like quickly Googled, what is AWS. I wasn't even, I didn't even have like a little knowledge about it. But I think it's just like my personality is I love solving puzzles and then you pair that up with leadership that was really willing to allow me to, you know, explore things on my own, get hands to keyboard experience and kind of take things and run with them. So I kind of just took the initiative and started doing, you know, these scale up tests in the cloud and just presenting small little bites of that data to leadership and you know, it was very collaborative and I think it just naturally pushed us in a certain direction towards our cloud migration for clinical projects.
Speaker 1:That's really interesting to hear how you just sort of dived in there and the parallel with puzzle solving too. This definitely was a big puzzle to solve. I know you've mentioned when you talked about this in, in other presentations there was some initial skepticism like yes, everyone was open to experiment, but initially doubtful about what this shift could accomplish. Could you describe a bit more about that?
Speaker 2:Yeah, so I think the two main concerns, rightly so, in the clinical realm for, you know, migrating a project to the cloud at our organization and at other organizations is this, you know, concern over security and concern over cost. And you know, through just initial, you know, hands to keyboard experience, I was able to show that there are a lot of ways to secure your information in the cloud and maybe more so than you would even think to secure them on premise, just via the services available, the managed services, some of the visibility dashboards, automated systems that are just so easy to set up in cloud, right at the drop of a hat in seconds versus on premise having to really organize and set up all of that on your own. And then in terms of cost, I think that another, I guess paradigm shift was this ability to tag resources. So I talk about this a lot, but if I were to recommend anything to people early in their journey of cloud migration, it would be to develop a really good tagging strategy. And so what I did is, like before we even grew our assets in the cloud, I sat down and developed a strategy of like how we would tag jobs by project, by person, by department, by all sorts of things. And then in the cost center you can see all of those breakdowns by tags. And I think that level of granularity to be able to say, oh, we ran X job or X software for Y project and it was run by Z user is something we actually didn't even have prior to migration. And so I think it's just about, okay, well looking at the concerns you have prior to cloud migration and really stopping and strategizing ahead of time before you really, really, you know, grow those assets in the cloud. Because there are just a huge array of resources to be able to address those concerns.
Speaker 1:Yeah, the cost aspect you mentioned, I mean one thing we always hear in pitches for any sort of cloud-based system or technology being used is the startup costs are high, but there's efficiency gains in the long run that make that case for the initial investment. And it sounds like that was something that was definite point of hesitancy, but you were able to very clearly document as you mentioned, more so than had ever been done before, how there would be that ROI.
Speaker 2:Yeah, and I'd like to add one more thing. So, with large sequencing projects and clinical projects, you know, they're very dynamic and flexible, so maybe we predict that they start on a certain date, but the actual samples come into our lab on another date. And with all of us, the initial samples actually arrived a lot later than we had anticipated. And had we not had this cloud environment, this flexible cloud pipeline set up, we would have, you know, erroneously invested in a whole lot of software, a whole lot of servers that have would've been just sitting there collecting dust for a good year. And so even in terms of startup cost, I think just having the flexibility of cloud, the flexibility to start servers on demand and have zero servers running when we don't need them was immense.
Speaker 1:Yeah, that's an excellent point to make as well. I mean, I hadn't thought about that too because you have, you know, that sunk cost with the physical servers and everything that you can't utilize elsewhere if you've built it for a certain project or initiative. On that note, so you started in 2019, how soon, where was this, you get the samples that arrive, what, what's the initial results that you get? What is the first sort of aha moment or proof that you can point to see it really does work?
Speaker 2:Okay, so we actually had like a working pipeline up within three months. So that was just through development effort. But like I said, and like you pointed out, we didn't have the initial samples come in until a little later. So it was not until All of Us samples started coming in that we saw the true nature of scale. And I think one graph that I really like to show people is when samples first started coming in for all of us, we were receiving a thousand samples a month. And, a few, maybe six months or so into it, don't quote me on that, but a few months into it, the number of samples we were receiving per month quadrupled to 4,000. And we did absolutely nothing at our center in terms of the cloud pipeline and cloud infrastructure to handle that change. Meaning our solution was completely dynamically scalable, the storage solution was completely dynamically scalable and the turnaround time of samples was just completely met that quadrupling.
Speaker 1:So there were a lot of bottlenecks eliminated, it sounds like.
Speaker 2:Yes.
Speaker 1:So one of the things that you had mentioned was going from that initial 80 hours of time spent on analysis to two hours on analysis. Can you talk with us a little bit about some of the metrics of success that you have now a few years later in the project?
Speaker 2:Absolutely. I think our greatest metric of success is that we reached a really big milestone at our center a few weeks ago. And that is that we have processed a hundred thousand whole genome samples for all of us. And you know, this is on top of the existing analysis requirements of our center. All of the other projects we worked along, that we work on alongside all of us. So I just think being able to point to this a hundred thousand whole genome samples is just an immense amazing milestone.
Speaker 1:It really is. And it sounds like too, having those early successes has obviously allowed you all tto expand the applications of using the cloud for the research at the Genome Center.
Speaker 2:And I think there's one more sort of unique story that happened at our center, and that's for all of us. There was a need to reanalyze approximately 30 k whole genome samples and we were able to do that in a matter of like 13 days, I think it was. So, it's a lot. I think if you don't have a concept of it y ou know, a whole one single whole genome sequencing sample can produce like hundreds of files and over 150 gigabytes of transformed data. So being able to reanalyze 30,000 samples w ithin our cloud infrastructure on top of, again, all o f the other projects we're working on all of the other all of us production requirements. And to do that in just a matter of two weeks is absolutely insane.
Speaker 1:How long might that have taken without the cloud?
Speaker 2:I think it would've taken quite a bit of time to just set up the pipeline for the reprocessing separate from, you know, the production of all of us and all of our other projects versus, you know, we have this great cloud formation deployment—this automated deployment in the cloud. And we were able to just, you know, relaunch a whole nother environment specifically for this reprocessing effort, complete that, and then completely remove all of those assets in the cloud. So it happened, you know, in a completely different environment and we were able to spawn that and then delete it. And I think that itself is incredible.
Speaker 1:That is pretty incredible. I mean, what you've done here is clearly amazing, especially with being self-taught in, in so much of this work and being a pioneer in it. What you've done here is clearly amazing and we know that there's a shortage of talented, skilled people who have the technical understanding and savvy and even in the sciences background in higher ed and beyond and further compounding that both in higher ed and beyond, is that there's a lack of people from diverse backgrounds. You have talked about mentorship being a big part of what you do and being an educator by passion. I'm wondering if there were any people who were beacon to you, I know that you've been involved also with Muslim Women in Tech. Are there any individuals or opportunities that helped you progress in this field?
Speaker 2:Yeah, so surprisingly I would say that my experience as a hijabi Muslim woman has been very unique in navigating the educational and professional space in that I've always been the only person who looks like me or has my religious identity. Even one of the only women in a department at one time in a very large department. And so I feel like why, one of the reasons why I love spending my time mentoring others, and one of the reasons why I really think education is a huge part of what I do outside of my work is because I want to be the mentor that I didn't have. I remember that, you know, it's kind of shocking, but when I was in graduate school, the person who was supposed to be my mentor was also the person who was like, oh, don't apply for this fellowship, you're never going to get it. Or, you know, you're the smartest person I know, but I don't think you'll be able to hold down a job for two years. Like, I'm not even joking. Like this is the stuff that I think a lot of women of color experience and Muslim women experience in their educational and professional journey. And I think people are only beginning to scratch the surface in talking about it. And so, you know, it's one of the major reasons why I want to reach out to people who are struggling regardless of their identity and be able to be a source of encouragement for them and show them how much they can really accomplish not just their goals, but beyond their goals.
Speaker 1:It sounds like in addition to the mentoring, there's, there's a bit of education that you have to do for other people in terms of how to navigate these worlds and to not have those preconceived notions based on what they may think, for instance a Muslim woman will do in a career.
Speaker 2:Yeah, absolutely. And you know, I think you mentioned, y ou know, self-taught cloud engineer and I think just me being from the life science background, you know, m asters in pharmaceutical sciences and jumping into a programmer role was very hard. I had never taken, you know, these formal computer science courses in undergrad or graduate school and all of the coding that I had done was also self-taught. So it took a lot of vulnerability to step into that environment, but I think just stepping into that environment and taking that first step is the hardest part. And I remember there was someone in our office also another woman of color who had joined w ith a start date that was the same as mine. And I remember standing in an elevator with her and she was having some trouble and I turned t o her and I said, you know, you don't have to be great to start, but you have to start to be great. And I think that's the mentality that I take with me wherever I go.
Speaker 1:That is a great motto to have. I love that. I'm gonna remember that one. Let's talk a bit about those mentorship activities. Obviously taking that step to be that that example for everyone is not something everyone feels comfortable doing, and luckily higher education is very fortunate to have someone like you who takes that sort of initiative and isn't scared to jump into the fray of not only learning new skills, but being someone who you know, is in a male dominated environment. Can you tell us about, you know, other people who may wanna mentor will wonder, well, what have you found to be effective in mentorship activities? Like what, what are some of the things that you do that you find resonate the best?
Speaker 2:So I like to meet my mentees where they are meaning rather than start the conversation with, this is where I can offer you, I really like to spend time asking them, you know, where are you right now? What are your goals? What do you want out of mentorship? What is your ideal mentor look like and how can I be that for you? Because like I said, I didn't have the mentors in the professional space that I wish that I had, and I think maybe I could have done a better job telling people what I needed from them. Now that I have the maturity, I can say that, but I think that, you know, it's hard to do that. And so I want to ask people what their struggles are and what they need at their place in life and based on their identity.
Speaker 1:Great advice. I mean, as you say, you've gotta meet people where they're at. I mean, what, are there specific activities that you're involved with? Are there, do you take on a specific student or can you tell us about some of your work maybe in Muslim women in tech?
Speaker 2:Yeah, so with Muslim women in tech, I have like an assigned mentee. She's awesome. And I just, you know, she's also a rare unicorn I like to say at the intersection of healthcare and technology. And so I was like, wow, perfect. We have very similar interests and I just really talked to her on a semi frequent basis. Sometimes she asks me about, you know, applying to graduate programs. I used to serve as a graduate admissions representative for my department in pharmaceutical sciences for PhD students and master students. And I also served on a graduate or like admissions evaluation committee for undergraduates. So I like helping people with their, you know, applications to college, for example, and especially important for people who haven't had family members who have gone to a higher education institution before. I also like to leave my inbox open to informal mentorship opportunities. You know, people will email me, message me on LinkedIn, and I try my best to, you know, make time for every single one of them. I think there was a girl whose mom was in the audience at aws Imagine where I was a speaker, and she reached out to me and she's like, oh my gosh, you know, I'm a Muslim woman aspiring towards a role in tech and in healthcare and you are doing something like I wanna do in the future. Can you tell me how you did it and can you tell me how I can get there? And so that's a lot of what I'm doing right now. In the past I had more formal teaching, appointments, for example, you know, adjunct professor at community college and working in graduate school for a really cool program, the Brain Explorer Academy, where we bought in students and had them work with elementary and middle school students and had them, you know, be able to talk to professors and learn about brains. So it's really just anything where I get to teach people about science, do some outreach and help people grow in the way that they want to grow.
Speaker 1:I really like that. It sounds like you're, you're open to so many different opportunities and avenues for being a resource for other people who are looking at either the life sciences or looking at cloud computing where we know we need so many people these days.
Speaker 2:Yeah, and I think it all stems from the fact that I actually wanted to be like an educator, and, but I found a way, I was like, no, but I really love solving puzzles and I found a way to kind of combine both my interests in solving puzzles and be able to informally mentor and educate people on the side.
Speaker 1:Well that is a fascinating combination to have, you know, some people say that every day we're a teacher every day we're a student and you definitely are one who Yeah, you're, you're definitely learning something new every day and obviously teaching a lot of people as well in both informal and formal capacities. So, I guess what I'd like to do is close out here with any advice that you might have for other researchers, anyone who's looking at, you know, going into research in the sciences and wanting to do cloud things.
Speaker 2:Cool. I can make it broader. I think a lot of people ask me, how did you, teach yourself cloud? Or even before that, how did you teach yourself to code? I used to work with plants. How did you go from working to plants to getting into farm-sci for human organisms. Right, humans. And I think a lot of people before they decide what they want to do haven't really paused and taken stock of what they actually do on a day-to-day basis. And I think my secret key to success is that before I even plan to learn xxx, learn the cloud or develop this technology or teach myself coding, I think I'm very hyper-aware of my day-to-day schedule and what I am already doing. So I have like really mapped out habits. I know how much time I'm spending a day on learning something. I know how much time I'm spending a day on my phone. I know how much time I'm spending a day on work, but not just at work really, really efficiently working and focusly working. And I used to do that very frequently in graduate school and even when I first started at the HGSC. And I think just keeping track of what I am doing helped me map out what I want to be doing and helped me be able to really take advantage of my time and learn things and do things that people didn't think were possible.
Speaker 1:That's really cool. Noora, thank you so much for taking the time today to talk with us.
Speaker 2:Thank you for having me. It was so exciting.
Speaker 1:Don't forget to check out our other podcast, blogs, and articles at thetambellinigroup.com.