April 6, 2018 2:47 pm
Last year we claimed that medical decision making in healthcare is being transformed by big data and that you, as data scientist, can have a big impact on how doctors treat patients. At the Beyond Banking days 2017 four teams of data scientists proved us right! A big shout out to the teams of Quantillion, Deloitte, our DIA Lab and risk model validation department. Curious what happened? Readthis (Dutch) articlein the Financiele Dagblad. Curious whats next? Read this post and submit your team!
The enormous amounts of data that researchers and doctors gather these days mean two things. First, it could be an answer to the problem that treatments are always developed for an average patient, while nobody is an average patient. Or worse, treatments are developed for average patients in the trial. Deborah Schrag, a Medical Oncologist at the Dana-Farber Cancer institute said: the average age of patients in a colorectal cancer trial is 55, but the average age of my patients in the clinic is 71. The clinical trial results arent really relevant to my decision making for my patients(At Harvard/ Personalized Medicine Coalition conference, November 2017). The answer to this problem is to further personalize treatments and the large amounts of data that are entering health care enable just that.
Secondly, it means that doctors and researchers, while being highly educated and trained in statistical analyses, really need the skills of talented data scientists. If you have 200 patients and 56000 data points per patients a linear regression or MANOVA is not going to tell you what to look for. Random Forest might. And where your average data scientists wont be impressed with a data set of 200 x 56000 data points and analytics like random forest, your average researcher / doctor is never trained in random forest (or anything like that) and considers 56000 data points a huge amount of variables. So what will happen when you join forces? Medical researchers focus on their strengths: biology, and you can go all out with your data science on a huge set of medical data. This is why at the Beyond Banking days we invite you to work with some of the great minds in our medical system and accelerate their research.
So what will happen at the hackathon on 8, 9 & 10 June? In short, we will give you datasets about skin cancer patients cells and hope you can tell us what to look for at the multi-omic level. Dont worry, youll know what the multi-omic level is by time you finished this blog. Basically we hope your data analysis can tell us more about why one patient survives a lot longer than another one. Triggered? Are you good with huge amounts of data? Read this post and submit your team!
Enabling personalized healthcare
Later on, we will tell you more about the specifics of the data and the research questions. To give you an idea on how valuable genomic data can be for a cancer patient, first let me take you back to the biology classes of your final days in high school and some things you would have learned had you chosen to study medicine or biology. One of things we learned from last years hackathon is the importance of mutual understanding between data scientists and medical professionals. So brace yourself for the biology of cancer and the central dogma in molecular biology.
Biology of cancer
Cancer is uncontrolled cell division. Cells divide themselves if the environment needs them to and this is what for example determines the shape of your body, enables you to grow, to heal, to respond to infections; biologically speaking, this is the essence of life. Two concepts and a dogma are relevant to understand the value of the data you will receive: how cancer and DNA intertwine, what metastases is & the central dogma in molecular biology.
- DNA & Cancer. Each cell in your body contains the same DNA comprising of roughly 23.000 genes. When a cell divides, your DNA gets copied into the new cell. If the DNA in a particular cell is damaged (due to sunlight, smoking or just bad luck) the cells start using their 23.000 genes in a slightly different way. This is not a problem if your body can dispose itself of these damaged cells, if the damage gets fixed by a process called DNA repair or if the damage has no impact on the behavior of a cell. However, if these processes get out of balance and cells keep on dividing uncontrollably, we call this cancer. If it happens to certain cells in the skin (called melanocytes) we call this melanoma.
- Primary tumor and metastasis. The DNA in a cell needs a certain amount of damage before it turns in to a tumor cell. When this happens for the first time in a patient, we call this the primary tumor. Thanks to work of many great scientists there is often a treatment for a primary tumor. The problem however, is that sometimes a tumor reappears in the body, but with more damage in the DNA. This damage will not only affect the process of cell division, but also other processes like migration for example. Migration is ability of cells to move from one part of the body to another. A very convenient skill when an immune cell needs to go to an infection somewhere in the body for example. However, when a cancerous cell can migrate to a different part of the body, we call this a metastasis.
- Central dogma in molecular biology. The central dogma is the basis for all medical scientists. It describes the flow of genetic information in every cell: from DNA via RNA to protein. Molecular data can be obtained by measuring these different levels. These are called omics. At the top of the dogma are the genomics. This omic measures the changes in the DNA. You can measure the sequence of nucleotides (the building blocks of DNA) and look for mutations (changes) by comparing this to a reference sequence (eg. a cancerous cell vs a normal cell). The DNA of cancer cells, as explained above, has lots of mutations that only occur in the tumor and not in the rest of the body. These mutations are called somatic mutations.
The next thing you can measure in the genomics is copy number. Easily put, this describes the amount of DNA per gene. Normally, you have two copies of a gene, one inherited from your mother and one from your father, but cancer cells are strange as for some genes they have more than two copies and for others they have less than two copies.
The second level of the dogma is the epigenomics. This omic describes the changes onthe DNA. By adding certain molecules on the genome, it can regulate which genes are switched on and off. In general, if these molecules are present on the gene, is switched off; if these are absent, the gene is switched on.
The next level of the dogma is the transcriptomics. We say farewell to the DNA and have now arrived at the RNA. There are different types of RNA and you can measure them all. mRNA is the most well-known type as by measuring this you measure the activity or inactivity of individual genes that are encoded in our DNA. These measurements are what we call an expression profile, an overall overview that provides information on the activity levels of all the 23.000 genes that are encoded in our DNA. In other words, we can measure how DNA expresses itself at the cellular level. This distinguishes, for example, a particular brain cell from a heart muscle cell, but also this distinguishes normal cells from their cancerous counterparts. In fact, nowadays you can even measure the expression of exons, the building blocks of the genes. These exons get pasted together to form an RNA molecule. By skipping an exon or multiple exons, the function of the RNA molecule and the protein can change. Another type of RNA is miRNA. These are very small molecules that play a role in regulation processes.
The last level of the dogma is the proteomics. We have now arrived at the protein level and the quantity of these can also be measured.
In conclusion, our DNA (and RNA) is unique and personal. Also, the positions of damage in DNA that causes uncontrolled cell division (the cancer) is unique and personal for each patient and we can measure that on different levels. This means the way patients should be treated might be different and personal for each patient and making sense of all this medical data is so relevant. In other words, this is why we strive for personalized medicine and why we need your help.
Still with us? No stress, before and during the hackathon we organize sessions to help you further understand. And medical professionals feel just as challenged to understand your data science, so you will be doing the explaining during the hackathon. Lets leave the biology for a second and go back to the challenge at hand. We picked one type of cancer in this hackathon: Melanoma, a type of skin cancer. If discovered in an early stage, a melanoma is not that much of a problem. In fact, since it is on your skin, the procedure to remove itis hardly an operation. However, if there is a metastasis the problem is much bigger and survival rates of patients drop drastically.
Why did we choose melanoma?
Melanoma is a type of skin cancer that develops from the pigment containing cells (melanocytes) of the skin. Melanoma is one of the most dangerous types of skin cancer and globally in 2015 there were 3,1 million people with active disease which resulted in 59,800 deaths. There is a strong relation with sun exposure, especially sunburns in the youth. As said, in an early stage the disease is well treated with a small surgical operation. However, quite a subset of these melanomas are able to spread to the rest of the body and then they are very lethal. Although we all call these tumors melanomas, each tumor is unique and has its own characteristics. The most aggressive melanomas mostly develop in young people, and unfortunately also children can be affected. For a long time there was no treatment that could cure these patients. Fortunately novel therapies (targeted therapy and immunotherapy) have been developed and at least a subset of these patients can be cured. These treatments, however, can have severe side effects and ideally you would only give these (expensive) treatments only to patients that will benefit from them. At this moment we are unable to predict which patients should receive this treatment and can possibly be cured. Hopefully, you can help us forward!
Time to talk about the data that you are given at the hackathon. As explained earlier, the molecular data can be obtained from different levels, the so-called omics. Most of the time, molecular medical researchers measure one of these omics to study a disease. Rarely, multiple platforms are measured in a patient. However, weve got data from multiple platforms. In 481 melanoma patients genomic, epigenomic, transcriptomic and proteomic data have been acquired.
Genomics. We follow the central dogma of molecular biology downwards. At the top of the dogma we have the genomics (remember?). Genomic data consists of DNA mutations and Copy Number. The dataset with DNA mutations has more than 400,000 measurements in total that represent somatic mutations for 481 patients. Copy Number data has more than 23,000 measurements as these are measured per gene.
Epigenomics. Epigenomic data comprise of DNA methylation data. For more than 450,000 positions, for each patient there is data if it that spot is methylated.
Transcriptomic. Transcriptomic data consists of miRNA and mRNA. More than 2,000 miRNA molecules have been measured per patient. Another form of RNA – mRNA – has also been measured. For all ~23,000 genes, we’ve got the expression values.
Proteomics. The final level of the central dogma is the proteomics. For a few proteins (< 300 per patient) there is data on the expression. Current theories estimate that there are roughly 100,000 different proteins in the human proteome. So 300 is not so impressive, but these 300 proteins can nevertheless contain a lot of information!
Phenotype. The last dataset contains phenotypic, clinical, prognostic patient information. For all patients, we know the gender, age, whether the melanoma has metastasized, where the melanoma approximately was, the stage of the cancer, the vital status and time to event and even more details.
Now, you hopefully got an impression of the data you will be given at the hackathon, let’s take a look where your experience and expertise can make the difference. As explained earlier, molecular researchers usually focus on one of the omics to study a disease. You will receive data from four different omics in eight datasets. We know that these omics are connected through the central dogma and because the same patients were used. So we challenge you:
- Can you show and visualize the correlations and concepts between the different datasets?
- As melanoma is a set of diverse diseases, can you stratify the patients based on all the data in to subgroups?
- Can you integrate all the data to make more accurate predictions for each patient than you would by only looking at one data source?
- Can you select a list of most informational variables that drive the predictions?
- Can you select a list of most informational variables distinctive for each patient subgroup?
- Can you identify a signature based on an integrative approach that can predict response to immunotherapy?
- Can you identify a signature that correlates with the prognosis of immunotherapy?
First of all, know that some very sick people will have more to hope for thanks to your work. Second, your model will need further validation and additional research that may lead to a publication in a scientific journal. As a token of appreciation, Prof. Dr. Ing. Peter van der Spek (Erasmus MC Rotterdam) warrants that your contribution will be acknowledged as a co-author on this scientific paper. And of course, eternal glory falls upon you and your team as you also compete for one of the cool prizes of the hackathon! So join our community and submit your team.
See you there,
Rogier van Wijck (Erasmus MC)
Dr. Antien Mooyaart (Erasmus MC)
Tjebbe Tauber (Abn Amro)
Categorised in: News
This post was written by BasDV