Contrary to what its name suggests, Complex Events Processing (CEP) is conceptually a simple idea. It’s also very easy to grasp on an intuitive level, because it corresponds well to how we operate in our “real life” (i.e. outside of work). That is why it’s best explained with examples.
However, CEP has its challenges. One has to face a lot of confusion regarding even the most basic terminology, various vendors have quite distinct approaches towards providing the same functionality, so comparing tools and choosing the right one for given task might be difficult.
I think it’s useful to know that it’s in large part because of historical reasons. The foundations for CEP we know today, were laid by teams working on different research problems such as active databases, discrete simulations, business intelligence, programming languages. Apparently they didn’t communicate much. That is why contemporary tools have different syntax (e.g. SQL- vs Prolog-like) or why not all of support the same set of “standard” CEP features (e.g. some lack efficient aggregations, negations or event absence detection mechanism).
Events, events and more events
The basic unit of data in CEP is, of course, an event. Formally events are defined as representations of something that has happened (e.g. new order placed) or as representations of change in state (e.g. temperature level raised from 1 to 2 Celsius degrees).
The input information to the system usually is fine-grained and low level, for example we might track readings from temperature and CO2 sensors at 1 second intervals and emit a new event for every reading. In CEP systems the expected output is actionable, business level information; meaning we expect business users to be able to immediately react to what happened and make informed decisions, for example we might emit event signalling fire outbreak.
CEP is a tool for getting from those low level input events to actionable insights. It aims to provide actionable, business-level information in real-time (or near real-time), based on low level input data from (usually) multiple, heterogenous sources. To this end, CEP extends Event Processing (EP) by introducing additional concepts and operations.
Usually we don’t get directly from input events to actionable insights. We need to build a set of gradual abstractions representing system’s state, that comprise events hierarchy. For our fire detection example we might have the following:
- Level 1 – Raw sensor readings are sent to separate streams (or channels) at regular time intervals. They might be just numbers with metadata (e.g. sensor id and timestamp).
- Level 2- Dealing with such raw readings is not very convenient, especially if we have various models of sensors measuring the same thing. It makes sense to abstract readings in a more usable information, e.g. by introducing levels (safe, suspicous, alarming).
- Level 3 – The data from multiple streams can be combined to emit an actionable event. We might issue an alarm if both temperature and CO2 reach alarming levels or if both start rising rapidly in short time.
Common CEP features
Although it’s possible (and interesting) to implement the functionality described above without dedicated tools, it is not be practial for more realistic scenarios. CEP tools have a lot of useful features such as:
- Pattern detection for detecting a single event or a relationship among multiple events. Examples: event A occured, followed by event B and C; event X occured within 10 min. after event Y; event M hasn’t occured for the last 10 min.
- Computations over events data, including aggregations and computations over sliding windows. Examples: count of events type T in every 10 min. window; average temperature in 10 last readings.
- Adaptation of data coming from multiple heterogenous sources, as well as enrichement and projection of event data. Examples: sensor A measures temperature in Celsius degrees, sensor B in Fahrenheit, we need to represent it in uniform way; when event X is detected, emit event Y containing data from X and some data retrieved from another source; when event M is detected, emit event N that contains only subset of M’s data.
- Abstraction of events and patterns among them, in particular it involves ability to emit new, higher level events. Examples: when both events A and B occur, emit event C.
- Filtering allows for ignoring events that are irrelevent for specific task. In CEP systems a large portion of input events might be ignored.
Those features will be very useful in my Pet Project.