Predictive analytics: navel-gazing or crystal ball?
"One need only think of the weather, in which case the prediction even for a few days ahead is impossible." -Albert Einstein
In recent years, there has been a lot of hype surrounding predictive analytics with Big Data. Applications extend beyond the walls of businesses: from determining the outcome of presidential elections and discovering new energy sources to detecting potential fraud. But how reliable is predictive analytics? And is it creating a misleading sense of certainty about the future?
Predicting the wrong path
First, let's properly define this buzzword. Put simply, predictive analytics means using historical facts and data to make predictions about future events. While the very title "predictive analytics" makes it seem like you can 'predict' and 'anticipate' the future just like the weather forecast, it will not always depict accurate predictions. Big Data brings advantages and shortfalls to this debate about accuracy. Whilst Big Data doesn't give the power to predict with 100% accuracy every time, it does provide a lens to examine the past in ever-greater detail. To predict, one must create testable models. The models themselves do not require large volumes of data; what they require is a deep understanding of the domain and access to data whose reliability is quantifiable.
More data may enable modellers to build a deeper understanding of that domain and to refine their models, but one must be cautious with creating sophisticated models which perfectly fit the data they are trained against. The key is to find the dominant predictive factors.
A word of warning, the more factors available in the source data, the greater the likelihood of finding false correlations. Equally, if the domain is poorly understood, too complex to model or has too many non-quantifiable factors, then Big Data will be of little value in building predictions. Predicting the actions of single individuals is a good example as they often make complex and irrational decisions which don't necessarily follow a string of logic or past behaviours.
Are you creating forecasting bias?
It's very common to overestimate the reliability of data-based forecasts because we are highly susceptible to selection bias. Consequently, we tend to seek out trends which will support our theories. Since models are validated against historical data, often the same source data against which the model was developed, the validation process is at risk of being inherently flawed. Furthermore, the more data that is available, the greater the probability that a correlation can be found which supports a favoured theory.
It is easy to build a model which perfectly predicts the past - and thus reinforce the mistaken belief that correlation implies causation. Even a small change in the assumptions of the model can dramatically alter the outcome. Therefore, it is important to re-train the model on new data as it becomes available.
Predicting the future
It is pivotal to recognise that predictive analytics is not a "crystal ball" that will always give correct answers to business' queries. The primary purpose of predictive analytics is to help users make better and more informed decisions by evaluating patterns inside their data. Here, Big Data presents an opportunity since it provides a continuous stream of new data with which to test and refine the predictive model. However, with greater availability of data, it is also tempting to maximise the predictive power of the model. This is dangerous as it leads to overfitting. Simpler models with lower predictive power will often perform better in production than complex ones that maximise their predictive power on datasets.
Predictive analytics and big data are exciting and offer huge opportunities to organizations and businesses around the world but they come with their own traps. So be wary before you start believing your predictions are a certainty and hold all the answers!