Understanding some of the key characteristics to consider when evaluating and comparing streaming technologies.
As data architectures are becoming more and more mature, streaming is no longer considered a luxury but a technology with a wide range of applications across different industries. Because of technical and resource limitations, batch processing was in fact always the preferred way to process and deliver applications, although with the development of micro-batch and native streaming frameworks in distributed systems based on Apache, high-scale streaming has now become much more accessible (Figure 1).
Some example applications for using streaming systems, can be processing: transaction data to spot anomalies, weather data, IoT data from remote locations, geo-location tracking, etc.
There are two key types of streaming processing systems: micro-batch and real-time:
- In real-time streaming processing, each record is processed as soon as it becomes available. This can therefore result in systems with a very low latency, ready to make immediate use of the incoming data (e.g. detecting fraudulent transactions in financial systems).
- In micro-batch processing systems, data points are instead not processed one by one but in small blocks and then returned after specific time intervals or once reached a maximum storage size. This type of approach favors therefore high throughput over low latency. Finally, micro-batch systems can be particularly useful if interested in performing complex operations such as aggregates (e.g. min, max, mean), joins, etc… on the fly before outputting the results in a storage system. Micro batch processing can therefore be considered a very good compromise between pure streaming and batch when performing for example hourly reporting tasks (e.g. mean weather temperature, etc.).
This post originally appeared on TechToday.