RESTfully query KafkaStreams

Many modern streaming applications need to provide statistics/aggregate data in realtime. I have see too many such applications that simply add Redis to the mix to perform such aggregations: last X values, rolling mean etc.

I propose a simpler mechanism to do this if you already using Kafka and KafkaStreams.

I will demonstrate below how to build a Restful service which can return the last 5 values sent to a specified topic and key. This is an extremely low latency query that could be used to

  • monitor kafka topics
  • allow interactive query on kafka

The RestService is started as part of Streams application initialization. It is built using Akka Http.

The main idea is that when a request is made for the most recent 5 values sent to the topic for a key, the request may actually land on any of the hosts. If the key is found on the same server, the state store on the local host is queried. The key may actually exist on another server. We can find out which one using the metadata. Once the remote host coordinates are discovered, a the original request is sent to the remote host which returns the requested values.

If a-key is found in the state store on server1 it is returned, else the request would be internally forwarded to the host whose state store contains a-key

I am a recovering entrepreneur. I dabble in everything data related and am fluent in scala, spark, kafka, cassandra and flink.