Saturday, July 7, 2012

Combining Data At-Rest Analytics with Data In-Motion Analytics

Take a look at this short video first.

The video provides a sneak peek at the 'Art of the Possible' - how traditional analytics based on existing data in data warehouses and data marts can be combined with real time analytics based on streaming data feeds to develop a closed look continuous feedback improvement system.

The structured data set residing in the traditional data warehouses and the marts account for only ~20% of the worlds data. The rest 80% is the world of unstructured, ambiguous, naturally unrelated data set.

The fundamental premise of combining at-rest analytics with that of in-motion analytics is the following:
1. Leverage the wealth of existing data to develop statistical models which can be used to detect patterns on unknown data as well as predict future outcomes with a high degree of confidence and certainty.
2. Deploy such parametrized models to a stream computing environment where data comes in real time and is primarily unstructured i.e. textual data, video, audio and any other form of unstructured data feeds.
3. Allow the real time data, typically as single records or small data sets captured in short time windows, to be fed as parameters to the predictive models.
4. Allow the models to track the real time data feeds and provide real time predictions.
5. If the models cannot detect patterns and the amount of 'unknowns' rise over a certain threshold, then trigger a mechanism to recalibrate the original statistical model.
6. The statistical models will ideally leverage not only the existing data in the warehouses and marts, but also leverage the more current data and other relevant and related data from other sources. The expectations is the model to be able to predict more events and detect more patterns.
7. Deploy the recalibrated model back into the streaming computing environment and expect the models to detect more events that are happening in real time.



In a subsequent blog, when I find some time, I will explain how, what you saw in the video was implemented using a set of products and techniques.

Stay tuned!

And, as usual, drop me a note with any questions or topics on Big Data that you want to discuss.