Joining Multiple Types Of Data With KSQL

Robin Moffatt has an example where he enriches streaming CSV data with information stored in MySQL:

This is a continuous query that executes in the background until explicitly terminated by the user. In effect, these are stream processing applications, and all we need to create them is SQL! Here all we’ve done is an enrichment (joining two sets of data), but we could easily add predicates to the data (simply include a WHERE clause), or even aggregations.

You can see which queries are running with the SHOW QUERIES; statement. All queries will pause if the KSQL server stops, and restart automagically when the KSQL server starts again.

The DESCRIBE EXTENDED command can be used to see information about the derived stream such as the one created above. As well as simply the columns involved, we can see information about the underlying topic, and run-time stats such as the number of messages processed and the timestamp of the most recent one.

It’s pretty easy to do; click through to see just how easy.

Related Posts

Machine Learning and Delta Lake

Brenner Heintz and Denny Lee walk us through solving data engineering problems with Delta Lake: As a result, companies tend to have a lot of raw, unstructured data that they’ve collected from various sources sitting stagnant in data lakes. Without a way to reliably combine historical data with real-time streaming data, and add structure to […]

Read More

Cloudera Stream Processing

Dinesh Chandrasekhar announces the new iteration of Cloudera’s streaming data processor: Cloudera Stream Processing (CSP) is a product within the Cloudera DataFlow platform that packs Kafka along with some key streaming components that empower enterprises to handle some of the most complex and sophisticated streaming use cases. CSP provides advanced messaging, real-time processing and analytics on […]

Read More

Categories

March 2018
MTWTFSS
« Feb Apr »
 1234
567891011
12131415161718
19202122232425
262728293031