With Silicon Valley blazing on as number one hot spot for high tech and the Bay Area claiming the same for biotech, it makes sense that Stanford, sitting there mid-peninsula basking in all that brilliance, should command a leading role in bioinformatics.
Today, Stanford’s Big Data in Biomedicine Conference kicked off with a star lineup and even a whiff of glamour. The conference room at the Li Ka Shing Center was packed with attendees looking up on a stage lit with colored lighting, backed by giant screens, and filled with elegant white leather chairs. Cameras and lights filled the lobby, with interviews being done on the side. The conference is being tweeted (#bigdatamed) like a presidential election!
Along with a contingent from Oxford University, speakers included representatives from the White House, NIH, and the FDA. Stanford is well connected with the national funders and policy makers. Though beginning with some of the hype from last year (the first time is always the most exciting), by the end of the first morning the conference had settled down in a somewhat grounded pace when some good questions were addressed: whether there is enough "science" in data science. Questions of privacy and quality were lightly touched on.
I was disappointed to see that none of the panel topics for the entire three days--which range from integrating genome scale data to machine learning--included bioethics. Stanford has some great speakers on bioethics, such as regular Mendelspod guest, Hank Greely. Big data certainly holds big promise for improving healthcare, but new technology always brings up big unforseen consequences. Addressing those thorny issues like privacy, equality, and safety head-on would have kept the conference more balanced.
Todd Park is the second CTO ever for the U.S. And having him come directly from the White House adds some glamour, yes. But Mr. Park’s keynote was more Easter service than scientific conference, more cathedral than campus. Using lines like “there has never been a better time to innovate in healthcare than now” and “we are so blessed to be living in . . .”, Mr. Park “god blessed” the crowd of scientists praying that “the force be with you.” Hm hm. Bring on the data, please.
And yes, the next speaker, Colin Mahony, used the term “nirvana,” alluding to that big data heaven that we all dream about. But from there on the conference came back to earth with slides and graphs of . . . . data.
Mike Snyder, perhaps the most biologically studied person in history, gave an update on his iPOP, or integrated personal omics profiling project. He didn't offer much new today other than that he’s had some progress in tracking epigenetic changes--not easy--and some characterization of his microbiome. Steve Quake and Stanford newcomer, Julia Salzman, rounded out the session leaving plenty of time for a panel discussion with Q & A.
Snyder’s ongoing project of looking at hundreds of thousands of his own biomarkers over time, often referred to as the Snyderome, provoked a great question from panel moderator and bioinformatics super star, Atul Butte. There’s been lots of progress on the various omes, but isn’t the most challenging one the time-ome (anyone for temporome?), or the ability to continually access samples and build the data set over time, Atul asked.
This is of course the key to Snyder’s project, so he launched into more details of his “longitudinal” study. It was Steve Quake who delivered the provocative line.
“I think of time as my friend,” Steve said.
It’s when you look at the data points over time that you’re able to find an anomoly or a signal, Steve reasoned.
Panelist Colin Mahony chimed in with an excellent observation as well. When you look at the various slides that the speakers use to present their data, Colin observed, time is usually the only constant. Time can provide “the best primary key” to work around in building data sets, he concluded.
Julia Salzman delivered what was for me the biggest WOW moment of the morning with her talk on “circular RNA.”
Apparently scientists have already been aware of RNA molecules that circle back on themselves, biting their own tail. But these molecules--unlike their linear siblings-- have been dismissed as non-protein coding, and therefore not interesting. Julia said that by being “willing to look at data that was in the trash”, her team discovered that circular RNA has implications for disease and could be used as a diagnostic tool. She went so far as to say that with this discovery, biological textbooks are now obsolete. That sounds like a big deal!
The issue of quality was raised. But unfortunately the discussion followed a specific instance of sequencing, and the general question of how to clean up huge amounts of data was not addressed. A recent guest bioinformatician at Mendelspod said her biggest challenge was not with storage or compute power, but in improving the quality of the data.
A small debate broke out between Snyder and Quake over big science vs. small science when the NIH representative speaker, Philip Bourne, asked what the NIH could do more for bioinformatics projects--other than give more money. Quake said that we have some great data sets out there produced from big consortium projects already. Now money should go to good ideas at the individual research level. Snyder argued that the ongoing large ENCODE project had been beneficial and proved that big data sharing projects could consist of individual researchers pooling their RO1s (smaller grants) together, sharing their data, and benefiting from more real time interaction. A hybrid of big and small science.
A top question at Mendelspod this year has been whether with the increased data storage and data mining abilities, research has become more data driven than hypothesis driven. And is anything lost in that? I presented this question to the panel.
Mike Snyder asserted, with examples, that “the biggest discoveries in science were not hypothesis driven.”
So the question was asked, if there is plenty of data, then what is the "scarcity" for generating better questions.
Steve Quake couldn't resist sharing a local joke: "how do you define data scientist? any statistician who lives in San Francisco.” Then Quake threw out a serious challenge:
“We have a scarcity of ideas, not data.”
The big data conference continues through Friday and is being broadcast live at https://bigdata.stanford.edu.
For Twitter stream, search #bigdatamed.