Dynamically Configured Stream Processing with Apache Spark and Drools

In this post I will show how to use the very powerful flatMapWithState in Spark structured streaming to aggregate arbitrary state. To make things interesting, I chose to integrate a Java rule engine drools to map over streaming data. This is a pretty powerful paradigm. It enables the following:

  • business rules are external to the application so can be changed independently. Changes can be made by business analysts.
  • Extends the power of rules execution by leveraging the distributed execution model provided by Spark.

So, here are the details. I will be sharing the full source code on github in the near future. Meanwhile, here is gist that shows how this works.

Obviously, the same idea is applicable to batch datasets as well. The key idea is that using drools this way using spark clustered execution is a match made in tech heaven enabling dynamically configured stream processing.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vishal Chawla

I am a recovering entrepreneur. I dabble in everything data related and am fluent in databricks, dbt, scala, spark, kafka,