Press "Enter" to skip to content

Author: Kevin Feasel

Comparing Azure SQL DB and RDS on Price and Performance

Joey D’Antoni summarizes a recent publication:

Yesterday, GigaOm published a benchmark of Azure SQL Database as compared to Amazon’s RDS service. It’s an interesting test case that tries to compare the performance of these platform as a service database offerings. One of the many challenges of this kind of a study is that the product offerings are not exactly analogous. However, GigaOm was able to build a test case where they used similar sized offerings as shown in the image below.

Read on for the summary and Joey’s thoughts.

Comments closed

Query Performance With and Without Stored Procedures

Bert Wagner takes on the idea that stored procedures are faster (or slower) than ad hoc SQL:

A few months ago I was presenting for a user group when someone asked the following question:

Does a query embedded in a stored procedure execute faster than that same query submitted to SQL Server as a stand alone statement?

The room was pretty evenly split on the answer: some thought the stored procedures will always perform faster while others thought it wouldn’t really matter.

Bert explains the answer. For me, performance isn’t even on the radar for how I try to convince people to use procedures. Instead, explain it to developers as an interface: developers program against a contract, where they can send in a specified set of inputs (procedure parameters), get back a specified output (the shape of the result set), and not have to care about the fine details. Programming to interfaces is extremely common in business development, and this is just one more interface.

Comments closed

The Benefits of Delta Lake

Kaushik Nath explains what a Delta Lake is and why it is beneficial:

Data lakes have generated a large amount of publicity as the new storage technology for our big data era. Because something new is always better, right? 

All this hype around data lakes has ignored their inherent drawbacks and limitations. Well, I’m Not Here to create a debate by saying that no one should ever use data lakes. But I am saying that companies should enter into the data lake investment with eyes wide open. Otherwise it might lead to some serious complications.

Delta Lake is a concept intended to mitigate some of the issues with data lakes in general, turning them into data swamps.

Comments closed

Family History with SQL Graph

Mala Mahadevan takes us through family histories in a graph database:

I have been working a lot of SQL Graph related queries and applications of the graph data concept to the extent possible within SQL Server’s graph capabilities. Genealogy, or querying family trees is an important graph data application. A lot of us may not have work related applications that are genealogy related, necessarily. But conceptually, this can apply to many similar tree/hierarchy type structures. I was looking into some data to play with in this regard. Sometime ago – we were discussing novels by famed novelist James Michener. My friend Buck Woody made a tweet-remark that it would need a graph database to keep track of the characters and relationships in some of Michener’s novels. I am a big fan of Michener’s novels, and the most recent one I have read is ‘Hawaii’. It is based on history and evolution of the Hawaiian islands, and has a rather complex network of characters, with many ethnicities and several interwoven relationships. I decided to use the characters in Hawaii as my test data to understand how to query geneological data, stored in graph database format.

Read on to see Mala’s table and a procedure to retrieve this data.

Comments closed

Mapping Usernames within the Power BI Gateway

Jeff Pries takes us through a difficult scenario:

With some data sources, such as Analysis Services, you want to pass the username of the person running the report back to the server executing the query (such as in a Row Level Security configuration). Adam Saxton of Guy in a Cube does a great job of explaining how this works at a high level in the video linked here.

In the video, Adam mentions that if our Power BI login does not match a UPN in our local Active Directory, then the lookup will fail…which is a problem if you just don’t have the option of having your Power BI login match a UPN in your local Active Directory and don’t want to manage static user mappings over time. The following will cover a method of allowing the lookup to use a different Active Directory property, such as “mail” to perform the matching.

Read on for the steps.

Comments closed

PySpark DataFrame Joining

Monika Rathor shows the various ways you can join DataFrames with PySpark:

PySpark provides multiple ways to combine dataframes i.e. join, merge, union, SQL interface, etc. In this article, we will take a look at how the PySpark join function is similar to SQL join, where two or more tables or dataframes can be combined based on conditions. 

One join type you don’t directly get in SQL Server is the left anti join. We can build something quite similar with NOT EXISTS, though.

Comments closed

Preventing Lost Code in Azure Data Studio

Dave Bland shows how you can keep from losing code when you close Azure Data Studio:

After working on a query for a long time, we want to make sure that we save the changes we have made.  I have lost hours of work over the years because I didn’t save the changes.  Azure Data Studio has a few features that can help prevent this from happening.

Read on to learn how. With SSMS, this had historically been the domain of third-party plugins, but it’s built into VS Code and ADS.

Comments closed

Power BI Desktop Data Profiling

Angela Henry walks through some changes in Power BI:

The Data Profiling feature was first added to public preview just under a year ago in November 2018.  Then it went GA in May 2019 and just 5 months later, they’ve added more goodness.  That’s one of the great things about Power BI, the release cadence.  If you don’t like something or want more features, just wait a few months (or five in this case).

One of the big things that was lacking with the Data Profiling feature was the text length statistics.  This is a huge deal for me.  It’s one of the things that I’ve encounter most frequently, incorrectly sized string columns in data warehouses.  Well, the wait is over, text lengths are now available.  Unfortunately, it’s not intuitive on how to get them.

Sounds like there’s still room for further improvement, but this is a positive step.

Comments closed

Financial Time Series Analysis in Databricks

Ricardo Portilla shares a demo of financial time series analysis in Databricks:

We’ve shown a merging technique above, so now let’s focus on a standard aggregation, namely Volume-Weighted Average Price (VWAP), which is the average price weighted by volume. This metric is an indicator of the trend and value of the security throughout the day.  The vwap function within our wrapper class (in the attached notebook) shows where the VWAP falls above or below the trading price of the security. In particular, we can now identify the window during which the VWAP (in orange) falls below the trade price, showing that the stock is overbought.

Click through for the article, as well as a notebook you can try out.

Comments closed

Limiting Index Sizes in Cosmos DB

Hasan Savran explains why you might want to exclude columns from Cosmos DB indexes:

If everything is indexed already; Why do we want to exclude some of indexes? Indexes are saved on disk, you pay for the storage in Azure. If you keep indexing everything, your index file gets larger and you pay more for storage.

     Also; write operations to index file takes longer if index file is larger. By keeping only what you need in index file will improve the latency of write operations. If you will need to change your indexing policies, Rebuilding indexes will take less time.

This behavior is quite different from the way SQL Server behaves, where indexing is more of an opt-in philosophy.

Comments closed