Connecting Technology and Business.

Reduce the noise in your data to improve forecast

The cloud and big data

When the cloud came into being, it brought with it immense storage power at cheaper rates. It ushered in the era of Big data. As a result, it also raised expectation levels in the minds of statisticians - and decision makers who depended on them - that this would do wonders to their decision-making processes.

Boon or bane?

The sample space had drastically increased due to social media and IoT, leading to more data being made available now. Applying statistical models to this huge data would improve the probability of a predicted event occurring (or not occurring) or improve the reliability of the forecast by pushing the R squared value to near unity. Right? Wrong. The data deluge only added to more noise than dependable signals.


Illusion or disillusion?

As time went by, people became disillusioned by the failure of the system to aid them with reliable information in decision-making. So as their hyper-expectations were not met, they just drop off quickly without pursuing further this journey.

The signal and the noise

It was now the turn of the experts to come with their reasons as to why such huge data could not help them decide better. One significant reason is that while there is enough data - and more – for the model, it requires a great deal of cleaning – removing the noise in the data that could distort the results and predictions before this data can be put to any use at all.


Persistence pays!

Early adapters of technology gained over the long run. Microsoft and Amazon are examples of winners who persisted in their vision to make this big data the fuel to their decision-making engine. They soon gathered themselves up from the trough of disillusionment to the slope of enlightenment by applying scientific methods to the data gathered and adopting newer methods to remove noise and false signals from the data. This way, they could arrive at real signals that aided in building reliable data models. They have climbed to the plateau of productivity now with their data models helping them in better decision making based on information.


Here are a few points to ponder:

  • People expect a lot from technology today, but the problem is while we have a lot of data, there are not enough people who possess skills to make this big data useful and not enough training and skill building efforts being put to make data scientists out of this huge population of technology experts.
  • Cleaning up data is the first big problem in predictive analysis – there are many external factors that might tend to distort the data that has been collected.
  • If we are considering a correlation between two variables and don’t know what causes this correlation, it is better not to consider this correlation at all. (Star fish predicting the FIFA world-cup winner or a baseball team’s win or lose determining the movement of the share-market).
  • Seeking for signals desperately, people end up with more noise than signals – so they make decisions with their instinct / gut feeling / experience playing an 80% part and statistics playing the last 20%. Instead, we should be guided by statistics 80% and leave the rest to our instincts and that too only if there is a drastically negative indicator in the statistical model. 

Here are some suggestions to reduce the noise and arrive at signals:


Start with a hypothesis / instinct and then keep refining it as you go ahead with analysis – this might sometimes lead to reverse your hypothesis.

Think probabilistically

When predicting, consider the margin of error (uncertainty) of the historic data and then include that in the prediction to make a decision. The person that discloses the greatest uncertainty is doing a better job than the one who conceals this uncertainty to his prediction. Three things to carry with while predicting: Data models, Scientific theories that influence the situation and experience (learn from the number of forecasts made and the feedback about the forecast)

Know where you come from

Consider the background and the existing biases of the possible forecaster / decision maker and the situation the data is being collected / considered

Try and err

Companies need to focus on the 80% effort for the last 20% results to retain the competitive advantage – real statistics of a few customers would be better than hypothetical data of a huge number of customers.




  • Large and smart companies especially Technology firms should dare to take risks in the competitive advantage area. Most of the risk-taking will pay off. As they are big, they can bear failures unlike small firms and individuals in which case this might be termed as gambling.
  • People make better inferences from visuals than just data presented as raw data. Charts must show simple essential info. Unless required to bring greater clarity, we must avoid showing more information that crowd together on the charts to create more noise.
  • People must become Bias detectors – raise business questions and be apprehensive about magic bullet solutions.
  • Analysts should disclose the limitations to their analyses.

- Insights from a session by Nate Silver

Innovations in Excel that users love

Real-time collaboration—As with other Office 365 apps, you and your co-workers can securely work simultaneously within an Excel file from any device (mobile, desktop, and web). This allows you to know who else is working with you in a spreadsheet, see where they’re working, and view changes automatically within seconds, reducing the time it takes to collect feedback and eliminating the need to maintain multiple versions of a file. Live, in-app presence indicators through Skype for Business make it easy to connect with available co-workers in the moment.

Powerful data modeling—Get & Transform is one of Excel’s most powerful features, enabling you to search for data sources, make connections, and shape your data to meet specific analysis needs. Excel can connect to data sitting in the cloud, in a service, or stored locally. You can then combine different data sets from these sources into a single Data Model for a unique, unified view. Plus, you can create a Data Model to import millions of rows of data into Excel—keeping your analysis in one place.

Insightful visualizations—Excel is an inherently visual tool, giving you new perspectives through a variety of charts and graphs. We continue to enhance visualization in Excel—with geographical maps and waterfall charts—to provide easier analysis and a better, more impactful way to share insights across your company.

Dashboard creation and sharing—Power BI is the cloud-based data visualization tool that allows you to create and publish dashboards. We intentionally designed Power BI and Excel to work together, so you can surface the most relevant insights for the task at hand. Excel data can be imported into Power BI, while Power BI reports can be analysed in Excel for new perspectives. You can then easily share these dashboards and insights with others in your company.

Built-in extensibility—Like other Office 365 applications, Excel can be customized to meet the specific needs of your company. Excel’s rich ecosystem of add-ins and other tools can help you work with data in more relevant ways. Plus, the Excel platform is flexible enough for IT admins or Microsoft partners to develop custom solutions.

-Office Blogs, Dec 2017

Security Intelligence Report of Microsoft

Microsoft regularly aggregates the latest worldwide security data into the Security Intelligence Report (SIR), unpacking the most pressing issues in cybersecurity.

Here are some highlights:

Cloud Threat Intelligence

The cloud has become the central data hub for any organization, which means it’s also a growing target for attackers.

Compromised Accounts

Definition - Attackers break into the cloud-based account simply by using the stolen sign-in credentials of a user
Analysis - A large majority of these compromises are the result of weak, guessable passwords and poor password management, followed by targeted phishing attacks and breaches of third-party services.

Cloud-based user account attacks have increased 300% from last year, showing that attackers have found a new favorite target.

Drive-by download sites

Definition - A website that hosts malware in its code and can infect a vulnerable computer simply by a web visit
Analysis - Attackers sneak malicious code into legitimate but poorly secured websites. Machines with vulnerable browsers can become infected by malware simply by visiting the site. Bing search constantly monitors sites for malicious elements or behavior, and displays prominent warnings before redirecting to any suspicious site.

Taiwan and Iran have the highest concentration of drive-by download pages

Endpoint threat intelligence

An endpoint is any device remotely connected to a network that can provide an entry point for attackers––such as a laptop or mobile device. Since users interact with an endpoint, it remains a key opportunity for attackers and a security priority for organizations.


Definition - Malware that disables a computer or its files until an amount of money is paid to the attackers
Analysis - Ransomware attacks have been on the rise, disrupting major organizations and grabbing global headlines. Attacks like WannaCry and Petya disabled thousands of machines worldwide in the first half of 2017. Windows 10 includes mitigations that prevent common exploitation techniques by these and other ransomware threats.

Ransomware disproportionately targeted Europe with Czech Republic, Italy, Hungary, Spain, Romania, and Croatia being the top six countries with the highest encounter rates.

Exploit Kits

Definition - A bundle of malicious software that discovers and abuses a computer's vulnerabilities
Analysis - Once installed on a compromised web server, exploit kits can easily reach any computer lacking proper security updates that visits the site.

Many of the more dangerous exploits are used in targeted attacks before appearing in the wild in larger volumes.

Takeaways and Checklist:

  • The threats and risks of cyberattacks are constantly changing and growing. However, there are some practical steps you can take to minimize your exposure.
  • Reduce risk of credential compromise by educating users on why they should avoid simple passwords, enforcing multi-factor authentication and applying alternative authentication methods (e.g., gesture or PIN).
    Enforce security policies that control access to sensitive data and limit corporate network access to appropriate users, locations, devices, and operating systems (OS).
  • Do not work in public Wi-Fi hotspots where attackers could eavesdrop on your
    communications, capture logins and passwords, and access your personal data. Regularly update your OS and other software to ensure the latest patches are installed

India specific report

The statistics presented here are generated by Microsoft security programs and services running on computers in India in March 2017 and previous quarters. This data is provided from administrators or users who choose to opt in to provide data to Microsoft, using IP address geolocation to determine country or region.

Encounter rate trends

15.5 percent of computers in India encountered malware, compared to worldwide encounter rate of 7.8 percent. The most common malicious software category in India was Trojans. The second most common malicious software category was Worms. The third most common malicious software category was Downloaders & Droppers.

The most common unwanted software category was Browser Modifiers. The second most common unwanted software category was Software Bundlers. The third most common unwanted software category was Adware.

The most common malicious software family encountered was Win32/Fuery, Win32/Fuery is a cloud-based detection for files that have been automatically identified as malicious by the cloud-based protection feature of Windows Defender. The second most common malicious software family encountered was Win32/Vigorf. Win32/Vigorf is a generic detection for a variety of threats. The third most common malicious software family encountered was Win32/Skeeyah. Win32/Skeeyah is a generic detection for various threats that display Trojan characteristics. The fourth most common malicious software family encountered was Win32/Dynamer. Win32/Dynamer is a generic detection for a variety of threats.

The most common unwanted software family encountered was Win32/Foxiebro. Win32/Foxiebro is a browser modifier that can inject ads to search results pages, modify web pages to insert ads, and open ads in new tabs. The second most common unwanted software family encountered was Win32/ICLoader. Win32/ICLoader is a software bundler distributed from software crack sites, which installs unwanted software alongside the desired program. It sometimes installs other unwanted software, such as Win32/Neobar. The third most common unwanted software family encountered was MSIL/Wizrem. MSIL/Wizrem is a software bundler that downloads other unwanted software, including Win32/EoRezo and Win32/Sasquor. It might also try to install malicious software such as Win32/Xadupi.

Security software use

Nearly 18% of the computers in India are not running up-to-date real-time security software when compared to the world-wide number of about 12%.

Malicious Websites

Attackers often use websites to conduct phishing attacks or distribute malware. Malicious websites typically appear completely legitimate and often provide no outward indicators of their malicious nature, even to experienced computer users. In many cases, these sites are legitimate websites that have been compromised by malware, SQL injection, or other techniques, in an effort by attackers to take advantage of the trust users have invested in them. To help protect users from malicious webpages, Microsoft and other browser vendors have developed filters that keep track of sites that host malware and phishing attacks and display prominent warnings when users try to navigate to them.

The information presented here has been generated from telemetry data produced by Windows Defender SmartScreen in Microsoft Edge and Internet Explorer.
  • Eight websites per hundred thousand URLs are malicious - drive-by download pages.
  • 420 websites per hundred thousand internet hosts are malicious - Phishing sites.
  • 890 websites per hundred thousand internet hosts are malicious - Malware hosting sites.
- Microsoft Security intelligence report, Volume 22