Press "Enter" to skip to content

Curated SQL Posts

Iterative Solutions To The Closest Match Problem

Itzik Ben-Gan has a follow-up article looking at row-by-row solutions to the closest match problem:

Last month, I covered a puzzle involving matching each row from one table with the closest match from another table. I got this puzzle from Karen Ly, a Jr. Fixed Income Analyst at RBC. I covered two main relational solutions that combined the APPLY operator with TOP-based subqueries. Solution 1 always had quadratic scaling. Solution 2 did quite well when provided with good supporting indexes, but without those indexes also had quadric scaling. In this article I cover iterative solutions, which despite being generally frowned upon by SQL pros, do provide much better scaling in our case even without optimal indexing.

Itzik has three separate solutions here, including one using the CLR.

Comments closed

Azure Data Studio, January Release

Alan Yu announces the January release of Azure Data Studio:

In previous versions of Azure Data Studio, when a user ran large queries, no results would appear in the results grid until the query could show all of the results. This was not a great experience for our users, thus we did some investigating to improve this experience. In the latest build of Azure Data Studio, users can now see results streamed in the results grid. This makes it a better experience since users can see the results quicker and interact with their data instead of being in a waiting state.

There are several enhancements this month, including Azure Active Directory support.

Comments closed

Naming Conventions In SQL Server

Phil Factor explains naming requirements for SQL Server and gives suggestions for conventions to follow:

There are no generally accepted standards for naming SQL objects. Although ISO/IEC 11179 has been referred to as a standard for naming, it actually only sets a standard for defining naming conventions. There is a sample standard in the ‘Naming principles’ document (ISO/IEC 11179-5), but this is merely an example of how a standard should be defined. However, it is quite close to a general good-practice in programming.

When naming a table, it is a good idea to use a collective name or ‘object class term’ for the entity if one exists ( such as Employee, Cost, Tree, component, member, audience, staff or faculty) but use the singular rather than the plural form where possible. For the sake of maintenance, use a consistent naming convention that is informative but brief. It helps greatly to start with a dictionary of the correct nouns and verbs associated with the application domain and use that. If it proves inadequate, then the team can build on it. If a data model has been created as part of the design phase, this dictionary should be an end-product of this work.

As Phil notes at the end, consistency is the most important virtue here. It’s hard to work with a database where you have tables named Employees, employee_dates, and tblFiredEmployee.

Comments closed

When A Database In An AG Has Different Query Store Settings

Erin Stellato explains how we can have a discrepancy in Query Store settings between the primary and a secondary in an Availability Group:

Last week there was a question on #sqlhelp on Twitter about the status of Query Store for a database in an Availability Group. I’ve written about Query Store and Availability Groups before so if you’re not familiar with QS behavior in an AG, check out that post first. But this question was, I think, specific to the values that shows on a read-only replica and how there were different query store settings between a primary and secondary. Let’s set it up and take a look.

Click through to learn why this may be and why you shouldn’t panic.

Comments closed

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

Comments closed

Power BI Embedding Techniques

Marc Lelijveld takes us through the different ways we can embed Power BI dashboards:

In the end of 2018 Microsoft announced a great new feature which allows you to secure embed Power BI content to client applications or sites and keep security still in place. Actually, it almost works the same as the publish to web feature, but then users have to log-in in the embedded frame before they see the content.
Not only to the security to access the report is in place now, also all other security features will still work. This means that Row Level Security can be still in place in case when you are using secure embedding. A lot of people online are really enthusiastic about the new features. Simply because it gives you the ability to embed your content on your website, web portal or wherever you want, without risking security issues.

Click through for discussion on the four techniques.

Comments closed

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN:

The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the Github repositories. Updated versions of these packages will also be posted to CRAN, so you can get the latest versions simply by running update.packages.

Read on for a summary of those packages.

Comments closed

Speeding Up The First Responder Power BI Interface

Kellyn Pot’vin-Gorman hits the Go Faster button:

The gist of this kit is that it is a database repository as part of the sp_BlitzFirst to collect monitoring alerting and performance metric data. Once you’ve set this up, then you can use a Power BI desktop dashboard as an interface for all that data.
Now this is an awesome way to introduce more DBAs to Power BI and it’s a great way to get more out of your metrics data. The challenge is, it’s a lot of data to be performing complete refreshes on and the natural life of a database like this is growth. The refresh on the static database I was sent by Tracy, once I connected my PBIX to the local db sources, took upwards of an hour to refresh. Keep in mind, I have 16G of memory, 32G of swap and have upped my data load options in Power BI quite high.

Kellyn walks through the things she did to improve performance as a starting point, so check it out and be aware that there’s even more that can be done.

Comments closed

Solving Naive Bayes By Hand

I have a post that requires math and is meaner toward the Buffalo Bills than I normally am:

Trust the Process
There are three steps to the process of solving the simplest of Naive Bayes algorithms. They are:
1. Find the probability of winning a game (that is, our prior probability).
2. Find the probability of winning given each input variable: whether Josh Allen starts the game, whether the team is home or away, whether the team scores 14 points, and who the top receiver was.
3. Plug in values from our new data into the formula to obtain the posterior probability.

This is an algorithm you want to solve by hand first—it’s just that easy. Then, once you understand it, let a computer do the work for larger data sets. Also, Super Bowl 2020 because I’m the kind of overly optimistic fool required of Bills fans. Just gonna leave this link here.

Comments closed

Data Transformation Tools In The Azure Space

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake.  And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL).  Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data.  There is good news and bad news when it comes to which product to use.  The good news is there are a lot of products to choose from.  The bad news is there are a lot of products to choose from :-).  I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Comments closed