The term “Big Data” is constantly being thrown around today by businesses and the technology world. Leveraging big data to gain competitive advantages is an organisational panacea. Given current compute power and storage capabilities we are now able to truly leverage big data in ways one could only previously dream.
This book however lays an important theme: big data is about knowing what and not why. Said differently, it is more about correlations than causations and making that mind shift is at the core of leveraging big data. Organisations that can combine mathematics and statistics along with programming and network science will be at the forefront of big data literacy.
Said best by the authors: “… when we say that humans see the world through causalities, we’re referring to two fundamental ways humans explain and understand the world: through quick, illusory causality; and via slow, methodical causal experiments. Big data will transform the roles of both.”
Three key takeaways from the book
1. Some staggering “Big Data” statistics at the time this book was written:
○ About seven billion shares change hands every day on U.S. equity markets, of which around two-thirds is traded by computer algorithms based on mathematical models that crunch mountains of data to predict gains while trying to reduce risk.
○ Google processes more than 24 petabytes of data per day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.
○ Facebook, a company that didn’t exist a decade ago, gets more than 10 million new photos uploaded every hour. Facebook members click a “like” button or leave a comment nearly three billion times per day, creating a digital trail that the company can mine to learn about users’ preferences.
○ The 800 million monthly users of Google’s YouTube service upload over an hour of video every second.
○ The number of messages on Twitter grows at around 200 percent a year and by 2012 had exceeded 400 million tweets a day.
○ More than 300 exabytes of stored data existed in 2007. To understand what this means in slightly more human terms, think of it like this. A full-length feature film in digital form can be compressed into a one gigabyte file. An exabyte is one billion gigabytes. In short, it’s a lot. Interestingly, in 2007 only about 7 percent of the data was analog (paper, books, photographic prints, and so on).
2. The amount of stored information grows four times faster than the world economy, while the processing power of computers grows nine times faster.
3. Big data’s ascendancy represents three shifts in the way we analyze information that transform how we understand and organize society:
i. We can analyze far more data
ii. Loosen up our desire for exactitude
A move away from the age-old search for causality. Instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening. Fundamentally, big data is about ‘what’, not ‘why’