Analyzing Twitter Data With Storm In HDInsight

Nischal S shows how to configure an HDInsight cluster to process tweets, followed by loading them into a Power BI dashboard:

When we need to process streams of real-time data, Storm is a great contender. Examples of streaming data are the number of consumer clicks and navigations on a website, IIS or user logs, IoT data, and social network information. In all these scenarios, we use real-time data processing. Apache Storm can process real-time unbounded streams of data.

The term “unbounded” defines streams of data with no start or end. Here, the processing of data is continuous and in real-time. Twitter is a good example. Twitter data is continuous, has no start or end time, and is provided in real-time by millions of Twitter users around the world.

Storm wouldn’t rank in my top three technologies for doing this, but it certainly does the job.

Related Posts

Databricks Runtime 5.2 Released

Nakul Jamadagni announces Databricks Runtime 5.2: Delta Time TravelTime Travel, released as an Experimental feature, adds the ability to query a snapshot of a table using a timestamp string or a version, using SQL syntax as well as DataFrameReader options for timestamp expressions.Sample codeSELECT count() FROM events TIMESTAMP AS OF timestamp_expressionSELECT count() FROM events VERSION AS OF version Time travel looks a bit like temporal tables in SQL Server.

Read More

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031