Moving from a batch-oriented data architecture to a stream processing architecture is more than just “running faster”. It raises the bar on concerns like scalability, resiliency, availability, etc. In this talk, I’ll discuss why this is so and how streaming architectures must look more like conventional microservice architectures to meet these new challenges.
I’ll also discuss how to pick streaming technologies for your applications based on several factors:
· Low latency: What’s my time budget for handling this data?
· High volume: How much data per unit time must I handle?
· Data processing: Do I need machine learning, SQL queries, conventional ETL processing, etc.?
· Integration with other tools: Which ones and how is data exchanged between them?
· DevOps: How do a want to package, deploy, and manage this application?
We’ll consider specific examples of streaming tools and how they fit these criteria: Spark Streaming, Flink, Akka Streams, and Kafka Streams, all layered on top of Kafka, the popular data backplane.