Thursday, August 9, 2012

Stream Computing (Streams) versus Complex Event Processing (CEP)

There is a general notion around IT professionals that stream computing (a.k.a. Streams) is just another buzz term for the traditional complex event processing (CEP). Although there are conceptual similarities between the Streams and CEP and acknowledging the fact that both of them fall under the analytical discipline of 'Continuous Intelligence', there are a few fundamental differences which put them into different leagues.

CEP is primarily used for analysis/analytics on discrete business events. Events are correlated in time using simple IF/THEN/ELSE logic. The events need not be of a single type or category. The data encapsulated in the business events are primarily structured in their form. The common CEP engines support modest data rates or around 10K messages/second with a latency typically in the 'seconds' range. The maximum data processing rates can scale up to around 100K events/second.

Streams on the other hand is designed to handle processing rates which are an order of magnitude higher than CEP. It can handle around millions of events per second with built-in linear scalability. Streams data sources are typically of a single event type e.g. camera feeds from traffic signals, sensor data generated from a pipeline or medical device, and so on. Streams is designed to handle the full gamut of unstructured data and contrary to IF/THEN/ELSE based logic in CEP, it is capable of performing advanced analytics on the data set. Examples of advanced analytics are only limited by the power of the mathematical and statistical models. Fast Fourier Transforms, Holt Winters algorithm, time series analysis algorithms would be some real world examples.

To summarize, although both Streams and CEP fall under the category of 'Continuous Intelligence', keep the following image in mind when any of your colleagues engage in the discussion:


1 comment:

  1. Many differences between CEP products and InfoSphere Streams appear to be a difference of degree. But when you set out to build a development platform for building--through programming--solutions that can apply any kind of analysis to any kind of data at any speed, you end up building a product that is fundamentally different from the business-oriented, discrete event-based, analyst-friendly CEP offerings.

    Sometimes it's hard to put your finger on it and explain it in a way that's easily grasped, but the experience of solving a challenging problem with Streams is nothing short of exhilarating. If you can think it, you can implement it. If you know of a code library that does something useful, you can apply it. During development, you don't worry about runtime issues, distributed processing, tuning, or scaling at all. The paradigm of a flow graph of individual processing steps ("operators") connected by streams is simple, powerful, and liberating in that it directs your mind to think of the problem in the most productive way.

    Sometimes problems that have a simple solution in Streams look in hindsight like they should have been simple to solve with other tools as well--but they weren't. I think this is because Streams has a well-conceived development model, with the right levels of abstraction; it's probably similar to the unleashing of application creativity that came from the introduction of the relational model in databases.