Saturday, September 1, 2012

The Genesis of Big Data

Big Data, big data, big data! It is the hype that has taken the IT industry by storm. The term Big Data which has been formed from a combination of two of the simplest words - big and data, has, with their combination have had a profound impact. Enterprises are intrigued by Big Data and all of them feel that there is something in in it for them.

There is no doubt that data is grown and that too grown in huge proportions. If you think about it from a different angle, this data was already there. What has changed is that technology has now allowed enterprises to get access to this huge ocean of data. The fundamental shift is that enterprises traditionally had access to the structured data sets which were primarily generated from business transactions and internal business process executions. Such data were primarily resident in databases, data warehouses where they were captured in a well structured form. However, with the new era of social computing, of data feeds from myriad of sources that are external to the enterprise, the enterprise is all of a sudden exposed to the internet of things which were traditionally not under their control. The industry has come to the realization that such data has a profound impact to the way business are and will be run in the future.

The ability to capture customer sentiments, their desires, feelings, product feedback and intentions in real time, as they happen and be able to influence the next business action or decision is going to provide that competitive advantage which has the potential to make of break product brands and improve our lifestyle through real time up to date decision support system. Some examples of the following may be:

  • Capturing a customer segment's negative sentiment and take prompt decisions to take corrective action
  • Predicting customer movements e.g. commenting on making a move from one mobile carrier to another based on bad experiences
  • Providing location-based product offers e.g. offering a $2 off on a subway sandwich if she is driving by a Subway sandwich store
  • Informing rush hour travelers on the optimum route to take to their destination based on real-time data feeds from traffic surveillance cameras
  • ...
the list is endless and each industry can come up with their own such list of untapped potential.

This non-traditional data does not follow the norms of database structures and designs; they are typically in the form of semi-structured textual data in social networks like Facebook, Twitter, LinkedIn or unstructured data from audio and video feeds.
The popular belief is that the combination of the semi-structured and unstructured data sets forms around 80% of the world's current data. The enterprises have realized that their future business decisions have been traditionally developed based on only 20% of the data (the structured forms) and the rest 80% is untapped!

This 4 fold increase of data, its sheer Volume and Variety based on the entire gamut of semi-structured and unstructured data is going to be a force to reckon with. Throw in the fact that the rate at which the non-traditional data is created is staggering and uncontrolled i.e. its Velocity has not been dealt with before. The IT industry has come to the realization that based on the sheer Volume, Variety and Velocity of the untapped data it is something very Big - a phenomena which our traditional technologies and infrastructure were incapable of handle. Big, in this context means that it is beyond the current comprehension. Data that is so Big that we have not seen before and have had a need to handle and process it. That is the genesis of Big Data!

Technologies are catching up with the 3 V's and we have started to realize that what was Big Data a year or two back may not be that big anymore now i.e. we are capable of handling it. It is important to understand that the term Big Data is temporal which means that what is big today may not and will not be 'big' tomorrow. Another important concept has unfolded recently. Although we have been able to come to terms with the volume, variety and velocity of the data, with this huge influx of data enterprises are faced with yet another challenge - how do I know that the data that I am gathering from non-conventional sources are indeed authentic and truthful? How do I believe in the Veracity of the data? So, add yet another V and now we get the 4 V's of Big Data Volume, Variety, Velocity, Veracity .

I strongly feel that the focus for us going forward is not about whether the data is big, medium of small; it is more about what we can do with the data. How can we empower the business with the next best decision and that with a level of confidence which can empower the executives to make decisions with confidence. Think about it! Let's talk about it another day.

1 comment:

  1. Hi Tilak - Thanks for writing on this subject. I am an PeopleSoft ERP consultant and was wondering how can this be used for some or many of our clients in the future. Can you tell me to start with what could be some of the indicators to understand that an enterprise has a Big Data problem ?