What does “big data” really mean? Viktor Mayer-Schönberger and Kenneth Cukier respond to this question in their 2013 bestseller, “Big Data: A Revolution That Will Transform How We Live, Work, and Think”. The 200-page book includes a solid overview of the premises, advancements, issues, and implications of the big data revolution.
For the very first time, we have the ability to easily and inexpensively capture and store massive amounts of data. This means we are no longer constrained to statistical methods of sampling or estimation in order to extract meaning from data. Now we can collect a complete data set and fully analyze it. Analyses can focus on the subject N=all, and we no longer have to guess at a population or hope for a representative subset based on random sampling. “Big data” means that we can finally have it all.
But change also means dealing with the problems that come with it. The larger a dataset, the more likely it is to have errors, and the less likely it is that analysts have time to carefully clean each and every datum. Data scientists have found, however, that even massive error-prone datasets are more reliable than small but clean samples. In a messy dataset, the authors write, “any particular reading may be incorrect, but the aggregate of many readings will provide a more comprehensive picture.” Essentially, the messy whole can outperform exact, accurate subsets.
As we make inroads into big data, we also make an important shift from results that focus on causation to results concerned only with correlation. Mayer-Schönberger and Cukier describe it like this:
“If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the improvement in health may be less important than the fact that they lived. Likewise, if we can save money by knowing the best time to buy a plane ticket without understanding the method behind airfare madness, that’s good enough.” (14)
Big data allows us to work backward, starting with data collection, then analysis, and finally￼ drawing conclusions from whatever patterns may appear. This shift away from trying to support￼ or disprove a theory can cancel out the possibility of researcher bias.
With their Kindle e-book readers, for example, Amazon.com has the ability to tabulate which sections of books are most highlighted, where readers tend to stop reading, and which themes prompt the most user engagement. But since these answers don’t do anything for their long- term business goals, the data just sits there. A publishing company given this same information, however, might use it to tweak author writing styles and marketing campaigns. In this example, both companies are using the same data, but the “answers” they get may be￼completely different.
Mayer-Schönberger and Cukier’s book is a quick read, filled with well-thought-out and easy-to-digest examples. You don’t need to be a data expert or computer science whiz to gain something from the text.
Whether your questions are about the history of the field, current concerns or where it’s headed next, Mayer-Schönberger and Cukier’s “Big Data: A Revolution That Will Transform How We Live, Work, and Think” has something for everyone. You’ll likely walk away both informed and curious about the immense possibilities that lie before us with the study of big data.
For a closer look at Mayer-Schönberger and Cukier’s “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” read the full review at the datascience@berkeley blog.