Sabina Leonelli, Philosopher, University of Exeter
Bio and Contact Info
Listen (6:44) Not a fan of the term Big Data
Listen (4:20) Something lost in bringing data together from various scientific cultures
Listen (3:36) Are data scientists really scientists?
Listen (4:11) Controversies around Open Data
Listen (3:03) Data systems come with their own biases
Listen (6:22) Message to bioinformaticians: Come up with the story of your data
Listen (1:15) Data driven vs hypothesis driven science
Listen (2:46) Thoughts on the Quantified Self movement
For the next installment in our Philosophy of Science series, we look at issues around data. Sabina Leonelli is a philosopher of information who collaborates with bioinformaticians. In today's interview, she expresses her concerns about the terms Big Data and Open Data.
"I have to admit, I'm not a big fan of this expression, 'Big Data,'" she says at the outset of the show.
Using data in science is, of course, a very old practice. So what's new about "big" data? Sabina is mostly concerned about the challenges of bringing data together from various sources. The biggest challenge here, she says, is with classification.
"Biology is fragmented in a lot of different epistemic cultures . . and each research tradition has different preferred ways of doing things," she points out. "What I'm interested in is the relationship between the language used and the actual practices. And there appears to be a very strong relationship between the way that people perform their research and the way in which they think about it. So terminology becomes a very specific signal for the various research traditions."
Sabina goes on to point out that the nuances of specific research traditions can be lost as data is integrated with other traditions. For instance, most large bioinformatics databases are done in English, whereas some of the individual research data may have been originally done in another language.
This becomes especially important with the new movement toward Open Data, where biases are built into the databases.
"The problem resides with the expectation that what is 'Open Data' is all the data there is," she says.
In fact, the data in Open Data tends to come from databases which are highly standardized and often from the most powerful labs.
How can bioinformaticians deal with these challenges? Sabina says researchers should be more diligent about creating "a story" around their data. This will help make the biases more transparent. She also says that a lot of conceptual effort must go into creating databases from the outset so that the data might be used for yet unknown questions in the future.
We finish the interview with her thoughts on the Quantified Self movement.
Podcast brought to you by: Chempetitive Group - "We love science. We love marketing. We love the idea of combining the two to make great things happen for your marketing communications."