Quadra

Connecting Technology and Business.

Reduce the noise in your data to improve forecast

The cloud and big data


When the cloud came into being, it brought with it immense storage power at cheaper rates. It ushered in the era of Big data. As a result, it also raised expectation levels in the minds of statisticians - and decision makers who depended on them - that this would do wonders to their decision-making processes.


Boon or bane?


The sample space had drastically increased due to social media and IoT, leading to more data being made available now. Applying statistical models to this huge data would improve the probability of a predicted event occurring (or not occurring) or improve the reliability of the forecast by pushing the R squared value to near unity. Right? Wrong. The data deluge only added to more noise than dependable signals.

 

Illusion or disillusion?


As time went by, people became disillusioned by the failure of the system to aid them with reliable information in decision-making. So as their hyper-expectations were not met, they just drop off quickly without pursuing further this journey.


The signal and the noise


It was now the turn of the experts to come with their reasons as to why such huge data could not help them decide better. One significant reason is that while there is enough data - and more – for the model, it requires a great deal of cleaning – removing the noise in the data that could distort the results and predictions before this data can be put to any use at all.

 

Persistence pays!


Early adapters of technology gained over the long run. Microsoft and Amazon are examples of winners who persisted in their vision to make this big data the fuel to their decision-making engine. They soon gathered themselves up from the trough of disillusionment to the slope of enlightenment by applying scientific methods to the data gathered and adopting newer methods to remove noise and false signals from the data. This way, they could arrive at real signals that aided in building reliable data models. They have climbed to the plateau of productivity now with their data models helping them in better decision making based on information.

 

Here are a few points to ponder:


  • People expect a lot from technology today, but the problem is while we have a lot of data, there are not enough people who possess skills to make this big data useful and not enough training and skill building efforts being put to make data scientists out of this huge population of technology experts.
  • Cleaning up data is the first big problem in predictive analysis – there are many external factors that might tend to distort the data that has been collected.
  • If we are considering a correlation between two variables and don’t know what causes this correlation, it is better not to consider this correlation at all. (Star fish predicting the FIFA world-cup winner or a baseball team’s win or lose determining the movement of the share-market).
  • Seeking for signals desperately, people end up with more noise than signals – so they make decisions with their instinct / gut feeling / experience playing an 80% part and statistics playing the last 20%. Instead, we should be guided by statistics 80% and leave the rest to our instincts and that too only if there is a drastically negative indicator in the statistical model. 

Here are some suggestions to reduce the noise and arrive at signals:

 

Start with a hypothesis / instinct and then keep refining it as you go ahead with analysis – this might sometimes lead to reverse your hypothesis.


Think probabilistically


When predicting, consider the margin of error (uncertainty) of the historic data and then include that in the prediction to make a decision. The person that discloses the greatest uncertainty is doing a better job than the one who conceals this uncertainty to his prediction. Three things to carry with while predicting: Data models, Scientific theories that influence the situation and experience (learn from the number of forecasts made and the feedback about the forecast)


Know where you come from


Consider the background and the existing biases of the possible forecaster / decision maker and the situation the data is being collected / considered


Try and err


Companies need to focus on the 80% effort for the last 20% results to retain the competitive advantage – real statistics of a few customers would be better than hypothetical data of a huge number of customers.

 

Notes:

 

  • Large and smart companies especially Technology firms should dare to take risks in the competitive advantage area. Most of the risk-taking will pay off. As they are big, they can bear failures unlike small firms and individuals in which case this might be termed as gambling.
  • People make better inferences from visuals than just data presented as raw data. Charts must show simple essential info. Unless required to bring greater clarity, we must avoid showing more information that crowd together on the charts to create more noise.
  • People must become Bias detectors – raise business questions and be apprehensive about magic bullet solutions.
  • Analysts should disclose the limitations to their analyses.


- Insights from a session by Nate Silver

Loading