Democratization of Bioinformatics?

Moray J Campbell

In cancer research, and other arenas of biomedical research, there are many challenges to ushering in an era of personalized medicine. Technical, logistical and ethical challenges need to be met to allow genomic-based diagnosis and prognosis in cancer. However, one of the greatest challenges is to generate a widespread democratization of statistical understanding and computational skills in cancer researchers. With the costs of genomic approaches falling, and the increasing desire to apply these technologies, the impact of this skills-shortage is set to become more critical. Beyond specialist centers, one result of this accelerating growth of genomic approaches will be to widen the gulf between the capacity of researchers to generate and to interrogate data. Put another way, genomic data interpretation is not able to keep pace with data generation, let alone match the projected rates. In part, this has led to the conundrum of the $1,000 genome with a $100,000 interpretation. There is no easy fix to meeting these challenges.

As a test case of a 40 something wet-lab biologist I know my rock bottom came a few years back when I realized I didn’t even know what I didn’t know. I’m now somewhere towards being something of bioinformatician. By that I mean I’m about half-way through an online MS in bioinformatics at Johns Hopkins University. Already, it’s been a revelation, but at times it’s like crawling over glass; I’m starting to get to the point of knowing what I don’t know! Perhaps the major aspect it has changed in me, is to start to be able to hypothesize on the genomic scale. In my own research I study the cancer epigenome from a translational perspective and aim to combine insight from cell line, tumor models and patient material. I feel I’ve started the long journey towards being able to conceive, design and interpret experiments both in Eppindorfs and at the command line.

In many ways, the challenges in genomics development reflect the emergence of Molecular Biology. From the conceptual start, for example at Rockefeller Foundation, NY, in the 1930’s, molecular biology was a restricted innovation at key national centers. Landmark discoveries followed and of course included the elucidation of the structure of DNA in 1950’s by Watson and Crick. Arguably, however, the democratization of molecular biology only arrived with the commercial production of restriction enzymes and the development of PCR technology by Dr. Kary Mullis in 1983.

Certain parallels can be seen in the field of bioinformatics. One of the earliest references to bioinformatics, for example by Dr. Paulien Hogeweg in 1970. The growth in bioinformatics has been remarkable. In 1988 there were 11 papers in PubMed that used the term “bioinformatics”, whereas in the same year the term “biochemistry” received approximately 14500 references. By 2012, there were 17760 publications with the term bioinformatics and 38946 for biochemistry. Although this is probably a classic Zeno Paradox, at that growth rate bioinformatics will overtake biochemistry in May of this year!

If it took 50 years for molecular biology to become democratized, will bioinformatics achieve the same widespread exploitation, by 2020? My worry, is that the answer is no, because the training of wet-lab biologists is not adapting rapidly enough. The power of molecular biology technology was built on the application of well-understood principals developed within biochemistry. By contrast, bioinformatics has at its center information theory and the exploitation of statistical insight and computational skills. These are both commonly outside of training for wet-lab PhD scientists and MDs, whose focus is generation of gene-centric but not genomic data-sets. Reflecting this training, researchers from these backgrounds are often most comfortable in the qualitative analyses of systems in terms of candidate genes, RNA, metabolites and proteins. However, for genomics applications to be applied requires the democratization of bioinformatics such that many more biologists, at all levels of their careers, have the key quantitative and computational skills required to understand and interpret disease states in terms of the genome, transcriptome, metabolome and proteome. Going forward, for new wet-lab biologists this requires training in statistical sciences and computational skills; for older scientists this may require the sort of retro-fitting that I’m undergoing.

Institutional level change to training at undergraduate and graduate level is paramount. Many institutions have goals of interdisciplinary training with a desire to develop trainees who undertake computational and statistical analyses alongside their wet-lab approaches. As yet no clear consensus has emerged on curriculum design concerning the balance of biology and informatics that needs. To make such changes, graduate programs will need to distinguish between what is worthy and what is necessary in already overcrowded curricula. Perhaps biochemistry falls into the worthy class whereas in the necessary class are a greater emphasis on biostatistics combined with programming skills. Another way to think about this challenge is to predict for newly graduated wet-lab PhDs what ratio of time will they spend between the computer and the pipette. For me, I believe that the ratio will be 80% computational approaches and 20% in the lab testing the predictions from these analyses. Of course, I say this to be provocative; maybe these ratios are too extreme, and maybe not every program needs to change? However, if graduate programs are designed to generate interdisciplinary scientists, then my thoughts are that these are the types of discussions that need to occur. For many programs, this will require making computational approaches an everyday part of research. I joke with our PhD students that we’ll take Excel and graphpad off the computers and only allow them to use R for data handling and analyses!

With more researchers able to undertake some level of bioinformatics analyses and appreciate its central importance, where will this leave the full-time bioinformaticians? In first instance I’d suggest that this will facilitate a greater, joined up, conversation between the scientists within departments of bioinformatics and next generation of MS and PhD researchers in other departments. An equally important outcome from the democratization of statistical understanding and computational approaches will be the greater appreciation of colleagues who work exclusively in the statistical and computational sciences.