Press "Enter" to skip to content

Category: Hadoop

HDInsight + Power BI + Spark

Reza Rad has a nice walkthrough on integrating several powerful technologies:

Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.

If you’re totally unfamiliar with Spark but interested in data processing, now’s a good time to start digging into the topic.

Comments closed

Why Data Lakes?

James Serra explains why you might want to use a data lake:

To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format.  Data is not structured or governed on its way into the data lake.  This eliminates the upfront costs of data ingestion, especially transformation.  Once data is in the lake, the data is available to everyone.  You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it.  Data governorship happens on the way out instead of on the way in.  This makes a data lake very efficient in processing huge volumes of data.  Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.

I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more.  Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.

Comments closed