Press "Enter" to skip to content

Day: October 15, 2019

The Benefits of Delta Lake

Kaushik Nath explains what a Delta Lake is and why it is beneficial:

Data lakes have generated a large amount of publicity as the new storage technology for our big data era. Because something new is always better, right? 

All this hype around data lakes has ignored their inherent drawbacks and limitations. Well, I’m Not Here to create a debate by saying that no one should ever use data lakes. But I am saying that companies should enter into the data lake investment with eyes wide open. Otherwise it might lead to some serious complications.

Delta Lake is a concept intended to mitigate some of the issues with data lakes in general, turning them into data swamps.

Comments closed

Family History with SQL Graph

Mala Mahadevan takes us through family histories in a graph database:

I have been working a lot of SQL Graph related queries and applications of the graph data concept to the extent possible within SQL Server’s graph capabilities. Genealogy, or querying family trees is an important graph data application. A lot of us may not have work related applications that are genealogy related, necessarily. But conceptually, this can apply to many similar tree/hierarchy type structures. I was looking into some data to play with in this regard. Sometime ago – we were discussing novels by famed novelist James Michener. My friend Buck Woody made a tweet-remark that it would need a graph database to keep track of the characters and relationships in some of Michener’s novels. I am a big fan of Michener’s novels, and the most recent one I have read is ‘Hawaii’. It is based on history and evolution of the Hawaiian islands, and has a rather complex network of characters, with many ethnicities and several interwoven relationships. I decided to use the characters in Hawaii as my test data to understand how to query geneological data, stored in graph database format.

Read on to see Mala’s table and a procedure to retrieve this data.

Comments closed

Mapping Usernames within the Power BI Gateway

Jeff Pries takes us through a difficult scenario:

With some data sources, such as Analysis Services, you want to pass the username of the person running the report back to the server executing the query (such as in a Row Level Security configuration). Adam Saxton of Guy in a Cube does a great job of explaining how this works at a high level in the video linked here.

In the video, Adam mentions that if our Power BI login does not match a UPN in our local Active Directory, then the lookup will fail…which is a problem if you just don’t have the option of having your Power BI login match a UPN in your local Active Directory and don’t want to manage static user mappings over time. The following will cover a method of allowing the lookup to use a different Active Directory property, such as “mail” to perform the matching.

Read on for the steps.

Comments closed

PySpark DataFrame Joining

Monika Rathor shows the various ways you can join DataFrames with PySpark:

PySpark provides multiple ways to combine dataframes i.e. join, merge, union, SQL interface, etc. In this article, we will take a look at how the PySpark join function is similar to SQL join, where two or more tables or dataframes can be combined based on conditions. 

One join type you don’t directly get in SQL Server is the left anti join. We can build something quite similar with NOT EXISTS, though.

Comments closed

Preventing Lost Code in Azure Data Studio

Dave Bland shows how you can keep from losing code when you close Azure Data Studio:

After working on a query for a long time, we want to make sure that we save the changes we have made.  I have lost hours of work over the years because I didn’t save the changes.  Azure Data Studio has a few features that can help prevent this from happening.

Read on to learn how. With SSMS, this had historically been the domain of third-party plugins, but it’s built into VS Code and ADS.

Comments closed

Power BI Desktop Data Profiling

Angela Henry walks through some changes in Power BI:

The Data Profiling feature was first added to public preview just under a year ago in November 2018.  Then it went GA in May 2019 and just 5 months later, they’ve added more goodness.  That’s one of the great things about Power BI, the release cadence.  If you don’t like something or want more features, just wait a few months (or five in this case).

One of the big things that was lacking with the Data Profiling feature was the text length statistics.  This is a huge deal for me.  It’s one of the things that I’ve encounter most frequently, incorrectly sized string columns in data warehouses.  Well, the wait is over, text lengths are now available.  Unfortunately, it’s not intuitive on how to get them.

Sounds like there’s still room for further improvement, but this is a positive step.

Comments closed