Dynamically Configured Stream Processing with Apache Spark and Drools
In this post I will show how to use the very powerful flatMapWithState in Spark structured streaming to aggregate arbitrary state. To make things interesting, I chose to integrate a Java rule engine drools to map over streaming data. This is a pretty powerful paradigm. It enables the following:
- business rules are external to the application so can be changed independently. Changes can be made by business analysts.
- Extends the power of rules execution by leveraging the distributed execution model provided by Spark.
So, here are the details. I will be sharing the full source code on github in the near future. Meanwhile, here is gist that shows how this works.
Obviously, the same idea is applicable to batch datasets as well. The key idea is that using drools this way using spark clustered execution is a match made in tech heaven enabling dynamically configured stream processing.