Tuesday, September 25, 2012

Difference between Descriptive, Predictive and Prescriptive Analytics

Analytics as a discipline has matured beyond its age propelled by the new era in which the human-machine-network is more Instrumented, Interconnected and Intelligent. The new era has made data more accessible than ever before which can be leveraged by the science of analytics to not only increase the accuracy of future predictions but also to up the ante by one level and start optimizing the best outcome from a set of predicted possibilities.

Analytics has a maturity curve, or more of a roadmap which starts from Descriptive Analytics and works its way up to Predictive Analytics and ultimately to Prescriptive Analytics.

Descriptive Analytics, often called after-the-fact analytics, reports on what happened, the frequency of occurrences of a certain event or action and provide drill down capabilities to get to the root cause of the problem. It provides various reporting views based on user roles; summary views for the executive dashboards, metric views for the mid-level managers and drill down root cause analysis details for the engineers and domain experts. Descriptive Analytics is rooted in what is known as traditional BI reporting.

Predictive Analytics focuses on simulating what could happen in the future, given the conditions of the recent past and forecasting the next possible events if the current trend continued for a given period of time. Predictive Analytics is rooted in building supervised and unsupervised machine learning algorithms and models.

Prescriptive Analytics builds on top of Predictive Analytics and focuses on evaluating the various possible outcomes from predictive models and coming up with the best possible outcome by employing optimization algorithms. Such algorithms are also capable of factoring in the effects of variability. Prescriptive Analytics leverages stochastic optimization algorithms and models.

It is imperative to realize that there is no short cut for any enterprise to achieve the highest maturity levels in Analytics (i.e. Prescriptive Analytics) without developing a solid and sound foundation of descriptive analytics followed by predictive analytics.

Enterprises need also to realize that, just by virtue of being in the new era of instrumented, interconnected and intelligent human-machine-network does not give them a free ticket to accessing the data; the data that is required for analytics to be useful. A solid foundation of data access with key focus on ensuring the veracity and viscosity of the data is of superior quality is the very first step to reap the benefits of modern day analytic processing.

Friday, September 14, 2012

Data Virtualization - Virtualize more than Consolidate

Data consolidation continues to be a persistent IT challenge, a source of constant frustration and IT spend. The days of a full-time IT spend on continuous data consolidation on a ever moving target of data sources and data types should be over. Well, even if it is not that strong as "over", at a minimum enterprises should be seriously considering other alternatives. This is where Data Virtualization comes to the party!

Data virtualization focuses on an abstracted layer which provides the necessary hooks to take a business-centric query and deconstruct the same into a set of atomic queries. Each such atomic query, focuses on a sub-set of data elements/types (from the original business-centric query) and determines which data source(s) to go against to retrieve the data. Each such atomic query is executed by the Data Virtualization layer and the returned data sets are then processed (joined) to form the final consolidated result set which is then made ready to be returned as a result to the business-centric query. The mode of data return can be standard SQL, Web Services or any other format which is standard enough to be consumable by business and/or enterprise applications.

The technology is real today. It is only important that enterprises take a close look at Data Virtualization and consider leveraging the same as a part of their overall enterprise data architecture strategy.

And yes, the technologies today can virtualize across both structured and unstructured data spread across databases and schema-less file systems.

Saturday, September 1, 2012

The Genesis of Big Data

Big Data, big data, big data! It is the hype that has taken the IT industry by storm. The term Big Data which has been formed from a combination of two of the simplest words - big and data, has, with their combination have had a profound impact. Enterprises are intrigued by Big Data and all of them feel that there is something in in it for them.

There is no doubt that data is grown and that too grown in huge proportions. If you think about it from a different angle, this data was already there. What has changed is that technology has now allowed enterprises to get access to this huge ocean of data. The fundamental shift is that enterprises traditionally had access to the structured data sets which were primarily generated from business transactions and internal business process executions. Such data were primarily resident in databases, data warehouses where they were captured in a well structured form. However, with the new era of social computing, of data feeds from myriad of sources that are external to the enterprise, the enterprise is all of a sudden exposed to the internet of things which were traditionally not under their control. The industry has come to the realization that such data has a profound impact to the way business are and will be run in the future.

The ability to capture customer sentiments, their desires, feelings, product feedback and intentions in real time, as they happen and be able to influence the next business action or decision is going to provide that competitive advantage which has the potential to make of break product brands and improve our lifestyle through real time up to date decision support system. Some examples of the following may be:

  • Capturing a customer segment's negative sentiment and take prompt decisions to take corrective action
  • Predicting customer movements e.g. commenting on making a move from one mobile carrier to another based on bad experiences
  • Providing location-based product offers e.g. offering a $2 off on a subway sandwich if she is driving by a Subway sandwich store
  • Informing rush hour travelers on the optimum route to take to their destination based on real-time data feeds from traffic surveillance cameras
  • ...
the list is endless and each industry can come up with their own such list of untapped potential.

This non-traditional data does not follow the norms of database structures and designs; they are typically in the form of semi-structured textual data in social networks like Facebook, Twitter, LinkedIn or unstructured data from audio and video feeds.
The popular belief is that the combination of the semi-structured and unstructured data sets forms around 80% of the world's current data. The enterprises have realized that their future business decisions have been traditionally developed based on only 20% of the data (the structured forms) and the rest 80% is untapped!

This 4 fold increase of data, its sheer Volume and Variety based on the entire gamut of semi-structured and unstructured data is going to be a force to reckon with. Throw in the fact that the rate at which the non-traditional data is created is staggering and uncontrolled i.e. its Velocity has not been dealt with before. The IT industry has come to the realization that based on the sheer Volume, Variety and Velocity of the untapped data it is something very Big - a phenomena which our traditional technologies and infrastructure were incapable of handle. Big, in this context means that it is beyond the current comprehension. Data that is so Big that we have not seen before and have had a need to handle and process it. That is the genesis of Big Data!

Technologies are catching up with the 3 V's and we have started to realize that what was Big Data a year or two back may not be that big anymore now i.e. we are capable of handling it. It is important to understand that the term Big Data is temporal which means that what is big today may not and will not be 'big' tomorrow. Another important concept has unfolded recently. Although we have been able to come to terms with the volume, variety and velocity of the data, with this huge influx of data enterprises are faced with yet another challenge - how do I know that the data that I am gathering from non-conventional sources are indeed authentic and truthful? How do I believe in the Veracity of the data? So, add yet another V and now we get the 4 V's of Big Data Volume, Variety, Velocity, Veracity .

I strongly feel that the focus for us going forward is not about whether the data is big, medium of small; it is more about what we can do with the data. How can we empower the business with the next best decision and that with a level of confidence which can empower the executives to make decisions with confidence. Think about it! Let's talk about it another day.