Deng Pan’s “Greedy” Algorithm: What is the experience of going from biology to computer?

Original link: https://www.msra.cn/zh-cn/news/features/ada-workshop-pan-deng

Editor’s note: The road to scientific research is not full of flowers, but often explores the unknown on a road without footprints. What should be the path of scientific research? How to seize the opportunity to make a turn? In his speech titled “The “Greedy” Algorithm of Life, Deng Pan, a researcher in charge of Microsoft Research Asia, shared his experience and gains from undergraduate graduation to the present. From biological crossover to computer, how did Deng Pan achieve “a spectrum in his heart and no panic”? When she encounters an opportunity, how does she try her best to seize every possibility? Let’s see how Deng Pan wrote the “greedy” algorithm of his life!

Click “Read the original text” or watch Deng Pan’s sharing video at the following address:

https://ift.tt/bMy8U4T


Hello everyone, my name is Deng Pan. I was an undergraduate student at the School of Life Sciences, Tsinghua University, where I conducted research on reproductive stem cells. Ph.D.’s research direction is mitochondria, studying the stress response of the cell’s energy center to various toxic injuries. Now, I am a researcher at Microsoft Research Asia.

Sometimes I am amazed myself: I am a student of biology, why did I come to Microsoft? I came to Microsoft, why am I still doing biology? How did I do such a cool thing?

Having said that, I’m really doing something I really like right now. Today, I will share with you my experiences and gains along the way.

The theme of my sharing is “The “Greedy” Algorithm of Life. Looking back on my ten-year journey from undergraduate graduation to the present, the “greedy” thought really accurately describes my every step of choice.

Doing experiments and writing code, I want it all

The story begins in 2012, my first year in New York to begin my PhD studies.

Our graduate school has a “rotation” system, each newly admitted doctoral student can choose 3-5 laboratories that interest him in the first year, conduct a “short-term experience” for 2-3 months, and then decide on their own. Which laboratory would you like to join for PhD level research.

At that time, computational biology was still a relatively niche direction. Out of curiosity, I chose a lab that uses computational methods to study the behavior of microbial populations.

I did a short exploratory project while rotating in that lab. At that time, we found that when a bacterium called Pseudomonas was added to the medium, it gradually grew into this branch-like structure. If two colonies are placed at the same time, when they meet in a narrow way, they will not disturb each other and take a step back.

Deng Pan’s scientific research and exploration projects during the rotation of the laboratory

We are curious, what makes the bacteria appear this pattern when they spread on the medium of Yimapingchuan? How do bacteria that are far apart know of each other’s existence?

Through various directional genetic modifications of microorganisms, combined with many brain-opening experiments, we have found secreted factors that affect the colony morphology and communication between colonies, explained the reasons for colony behavior with mathematical models, and simulated them in Matlab. The dispersal process of colony populations.

Speaking now, I also find this research very interesting. At that time, I also considered joining this laboratory to continue research in this direction. Although I am interested in biology and programming, the training of biological experiments must have a venue and environment; while programming is a lot more free, I can use evenings and weekends to study. Therefore, in the end, I chose the traditional biology research direction and used my spare time to teach myself programming.

I have to say, I was very energetic when I was young. But it is this idea that makes it possible for me to stand here today.

Whether it’s useful or not, I learn it all

no sooner said than done. In the next few years, although I mainly conducted traditional biology research, I took all the courses on computational biology and biostatistics that were available in graduate school, and took many open courses on Coursera. , and carefully took notes and completed the homework. At that time, MOOCs were just emerging, and there were many high-quality public courses online.

Some excerpts from Deng Pan’s online learning

In fact, I didn’t know at that time whether these courses would be helpful for future biological research, or whether it would help me in finding a job in the future. Most of my studies were purely out of curiosity for knowledge.

For example, I heard that an algorithm course was taught very well, but the language of this course was Java, so I went to learn Java; I heard that C++ can better cultivate computational thinking, so I went to USACO to use C++ to brush questions ( At that time, LeetCode was just established); I heard that big data cloud computing is a special fashion, so I went to study the course of cloud computing…

Later, these “non-utilitarian” learning experiences have helped me a lot. For example, when I came to Microsoft to interview for an algorithm, I was able to “have a spectrum in my heart and not panic at all”. This algorithm learning experience also helped me get the possibility of my first computer internship.

When you have an opportunity, go all out

Over the years, I have made an observation: Compared with European and American students, Chinese students are more likely to say one sentence: I can’t. But really can’t do it yourself? Or do you feel like you’re not ready? Do you dare not try because you are not fully sure of yourself and are afraid of failure?

Due to the nature of the profession, students majoring in biology generally do not go to companies for internships during school. But in the spring of the fourth year, I suddenly heard two news: one was that the laboratory was going to move from New York to Worcester, Massachusetts that summer, so it was almost impossible to conduct experiments during the summer vacation; the other was that there was a laboratory named The Google Summer of Code summer internship program is accepting applications.

The Google Summer of Code project is launched in February every year. The organizer first selects a batch of open source projects that meet the standards, and then opens the registration of students, and the open source organization selects their favorite students. Selected students will have a three-month online full-time internship during the summer to contribute code to open source projects, and many students will also choose to become long-term open source contributors after the internship.

This project was good news for me, who was only busy writing code at the time, and I happened to have plenty of time during that summer vacation. There’s only one problem: When I heard the news, the deadline for students to apply to the program was two weeks away.

During these two weeks, I need to select the project I want to apply for from more than 100 open source projects, complete the code tasks according to the requirements of the project, and submit the application. At that time, my knowledge of GitHub was limited to: in an introductory class of less than 2 hours, I established an account and fork a repo that my classmates just established.

Based on my poor skills and interests at the time, I quickly locked onto the project to which I applied – a machine learning library written in C++ primarily for biomedical research. Their program homepage says: Girls are welcome to apply.

At the time, I thought: I must seize this opportunity.

“I’m a girl, I’m a biomedical major, and I’ve written C++.”

Relying on this advantage I forcibly found myself, I began to nibble on the code task of this project: adding a function module for mean calculation to the linear algebra library. It sounds simple, but to me at the time, it seemed like a primary school student was required to solve a calculus problem. The entire code base has about 500,000 lines of code. I need to find the target path from it, understand the dependency, and refer to the implementation of other functions to complete this feature that supports both CPU and GPU backends, write local tests, and run them in Docker. Submit the task again via GitHub – each step needs to be learned from scratch.

To tell you the truth, in those two weeks, I finished this thing while crying. I can’t understand the code, the environment is not well matched, the compilation will report errors, the test fails, and I can’t even understand other people’s discussions. I feel so difficult, I feel that I can’t do anything, it’s all problems, I feel that I have no time…

But in the end, I persevered, completed the feature as much as I could, passed all the tests of the Pull Request, and got the internship offer.

Did I feel guilty then? To be honest, I am guilty. When I applied, I just completed the task in one breath. Can I handle it during the internship?

But am I going to back down? I do not want. At that time, I posted a circle of friends, and now I think it makes sense: all opportunities in life are “catch the ducks on the shelves”. If you want to wait until everything is ready, it may be too late.

Dr. Deng Pan’s fourth-year summer

During that summer vacation, I refactored the back-end of the linear algebra library based on C++11 features, unified the interface of data storage and operation, introduced a new serialization module, and completed the addition and deletion of nearly 40,000 lines of code. Later, I continued to contribute code and officially became a member of the team. In the second year, I served as the mentor of the internship program, and participated in the offline hackathon organized by the team in Budapest.

This experience greatly “inflated” my self-confidence. After that, I also gave a “cheeky” lightning speech at the Global C++ Developer Conference, and applied to be the volunteer leader of the coffee talk event at the San Jose World Science Fiction Conference. Not long after I joined Microsoft Research Asia, Taking the initiative to represent the discussion group and reporting to the entire hospital at a hospital-wide event… I have found that this experience brings constant positive feedback. Now, when faced with tasks that I do find challenging, I don’t say “I can’t” but “I’m not sure, but I can try.”

There is no optimal path in life

Did you find a problem: In terms of my current career development direction – computational biology, I actually took some detours.

In the middle of my Ph.D. years, I was also very troubled. When I ran the experiment all night but couldn’t observe the biological phenomenon I wanted; when I drove to the laboratory for half an hour in the blizzard on Sunday, only saw a negative result and had to go home; when I did it for three months When my experiment failed for some reason, but there was no way to set a breakpoint to troubleshoot… I also regretted: why didn’t I choose computational biology in my first year of Ph.D.? If we had made different choices, would we not have been so exhausted, and we could have a brighter future – such as investing in the tide of transcoding after graduation, or taking advantage of the shareholder wind of computational biology earlier, career development What’s more exciting?

Photos shared by Deng Pan during his speech

But now I don’t think so anymore.

First of all, limited by my vision and ability at the time, I have made the most reasonable choice at the time. There is no fixed answer in life, and we have no way to plan the optimal path.

Second, I may seem to have done some “futile efforts”, but every experience I take seriously will form my unique accumulation and ultimately shape my unique life.

Finally, and most importantly, in this process of constant experimentation and exploration, I found a field that I really love and am willing to strive for.

Find your true love and go on bravely

Scientific research is actually a very painful thing, and you are too easy to feel failures and setbacks. Now everyone is still comparing each other’s “volume”: this year you have published 5 papers, next year I will publish 10 papers, the peer pressure is unbearable.

At this time, only when you find a field that you really love, you will not be easily coerced by external pressure, and you will not be chasing hot spots and low hanging fruit, but really calm down, think, scrutinize, and create some real valuable results.

And if you really find your true love, go on bravely and don’t give up easily. After all, persistence and commitment are the keys to success.

In my mind, Microsoft Research Asia has always been an academic temple. When preparing for the interview, my friend and I said: I am afraid I have never been so serious in the college entrance examination. But when Dr. Tie Yan asked me in the interview what direction I was interested in besides computational biology, I probably gave a standard interview wrong answer. I said: If I hadn’t known that Microsoft Research Asia was doing computational biology, I might not have submitted my resume.

After knowing what he really wants, he speaks so hard.

Thinking that it has been almost two years since I came to the institute, I feel that I am still in the “honeymoon period” with the institute. There is great academic freedom here, everyone’s research interests are respected, and there are too many excellent and reliable colleagues who can conduct thinking collisions and cross-field exchanges. Although doing research still makes me sigh and scratch my head from time to time, I still feel like I am doing something that makes me happy, and I am always motivated.

Finally, let’s end with a quote from the famous British writer Virginia Woolf: No need to hurry, no need to sparkle. No need to be anybody but oneself.

Scientific research is a process of constant search, and so is life. I hope my sharing today can bring some help and inspiration to young people.

Speaker introduction

Deng Pan is a researcher in charge of Microsoft Research Asia. He holds a bachelor’s degree in Biology from Tsinghua University and a Ph.D. in Cell and Molecular Biology at Cornell University. He has published papers in international academic journals such as Molecular Cell, Cell Research, Seminars in cancer biology, and PNAS. The current research direction is computational biology, including but not limited to the application of deep learning in immunology, genomics, epigenetics and microbiology.

This article is reprinted from: https://www.msra.cn/zh-cn/news/features/ada-workshop-pan-deng
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment