Big Data – Discovering the What but not the Why

160218-coder-stockAs the field continues to develop and more people are becoming aware of it, there seems to still be a lot of confusion about just exactly what Big Data (BD) can and can’t do. Bernard Marr wrote a great article in plain English about the limitations and possibilities of BD. I want to use his article to delve into a little more detail about why this topic is so important for data scientists, managers and data practitioners.

Lehikoinen argues that the ‘Big Data machinery” answers the what (correlation) but not the why (causation). Lehikoinen states there are three disadvantages related to losing causality: 1) Responsibility, 2) Learning, and 3) Trust. A concern described is that it is not acceptable to make a decision without knowing why. When causality is lost, the question is posed ‘Who is responsible for the decision if it is made based solely on a Big Data prediction?”. They suggest that human involvement is important to interpreting and acting on the information extracted from the data.

The second disadvantage in the loss of causation is that we do not learn if we do not know why. When decisions are made on the real-time data rather than lessons in the past, you’re acting on a hunch rather than experience. The danger is that there isn’t any way to develop your actions based on earlier interactions and we don’t learn. The third disadvantage Lehikoinen describes is that of trust. He asks whether we can consider algorithms authoritative if we don’t know the reasons behind them. If a prediction appears to be a false positive, he argues that it may be next to impossible to take corrective actions. The article concludes by saying it ‘takes a lot of human creativity to innovate on what datasets to use, what correlations are important and how to test them.’ Big Data does not provide the answers you are looking for if you don’t know what questions to ask.

I think in 2016 we’ve yet to fully discover how BD is going to revolutionize nearly every person and industry. Can it solve a multitude of problems quicker and with more certainty? Probably. Will it? That answer depends on how we interact and use it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.