Press "Enter" to skip to content

Curated SQL Posts

Iterating over JSON and XML Data in SQL Server

Steve Stedman explains how you can iterate through XML and JSON data using the APPLY operator:

The results are what we are looking for in this specific example, but where they break down is when there are more employees represented in the XML, for each employee we need to add another UNION to bring the results together. That is not very iterative and since the title of this post includes the word iterating, we need to focus on how to do that.

Now we introduce the CROSS APPLY functionality that can be used like a JOIN to take a value from one result set (table) and apply it to a function that gets called once for each row. You can reference my JOIN TYPES poster for using CROSS APPLY

Click through for the full set of examples.

Comments closed

Moving a Semantic Model from SSAS to Power BI

Teo Lachev explains why it might make sense to move a semantic model from SSAS over to Power BI:

We decided to move the semantic model to Power BI so that Power BI owns the data. Besides potentially improving the report load time, this architecture has also other important advantages (to learn more, read my “Power BI Large Datasets: The Good, the Bad, and the Ugly” post). If you’re not on Power BI Premium, that “movement” might not easy if you have opted to use Visual Studio or Tabular Editor for development. That’s because Power BI Pro doesn’t expose the XMLA endpoint, so your only option is to migrate the model to Power BI Desktop. But migrating an SSAS Tabular project to Power BI Desktop is not officially supported and there is no automatic migration path.

Click through for the full explanation of why, as well as notes on the process.

Comments closed

Dropping Temporary Tables at the Beginning of a Procedure

Richard Swinbank walks through a technique to drop temp tables as a procedure begins:

Local temporary tables – tables with names that begin with a single # character – are dropped automatically by SQL Server when they are no longer in scope. So why drop them explicitly at all?

I’m writing this from the perspective of a data engineer. ETL processes or their components are often implemented as T-SQL stored procedures (SPs) which might move significant amounts of data around and take a while to run. I’m not talking about the sort of SP that needs to run in 30ms to guarantee application responsiveness!

Sometimes it’s useful to store intermediate results in temporary tables, sometimes it’s necessary, and sometimes you find yourself maintaining code built by someone who just really loves them. Adding explicit DROPs to code like this can make development and maintenance a bit smoother.

My concern with doing this is that it prevents temporary object caching. Granted, Richard explicitly calls out that these are long-running procedures (and probably one of a particular procedure running at a time), so that does reduce the concern accordingly.

Comments closed

Deploying ADF via Azure DevOps

Kamil Nowinski has part two on a series about releasing Azure Data Factory code:

Struggling with #ADF deployment? adf_publish branch doesn’t suit your purposes? Don’t have skills with PowerShell? I have good news for you. There is a new tool in the market. It’s a task for Azure DevOps Release Pipeline to deploy whole ADF from code (JSON files) to ADF instance in Azure. Behind the scenes, it runs the PowerShell module which does all job for you.
Sounds unbelievable? But it’s real! Check it out for yourself.

Click through for the video.

Comments closed

Performing a Power BI Risk Assessment

David Eldersveld has a set of questions to ask when building Power BI dashboards:

As with any important endeavor, businesses should be in a position to document and have a plan to mitigate any possible risks associated with deploying software. It is no different with Microsoft Power BI.

Note that the purpose of this post is simply an encouragement to evaluate risk exposure and consider how you would overcome potential future issues that could arise from any cause–it is not a caution against using external tools created by the wider Power BI community.

Read the whole thing.

Comments closed

When NULL Makes Sense

Chris Johnson shares some rules of thumb on when allowing a column to be nullable makes sense:

Now I want to talk a bit about when we should be using NULLs and when we shouldn’t, inspired again by some things I’ve seen in legacy code a few days ago. And when I say using, I mean when should we allow values to be NULL, in tables, and parameters, and anything else.

The first thing to remember is what does a NULL represent. It’s very simple, NULL means “I don’t know”. If you keep that in mind, and you’re practising some good development design where everything has a point, you should be able to tell if it’s possible to be unsure about any particular value for a row.

One of the tricky parts about NULL values and legacy code is that if I need to add a new column and there is not a good default, either I make the column nullable (regardless of whether it should be) or I take downtime by blocking table access until my change is in place. As a result, quite often, I simply need to make something nullable because I can’t afford to block the table that long.

Comments closed

FlinkSQL in Cloudera Streaming Analytics

Marton Balassi announces support for FlinkSQL in Cloudera Streaming Analytics:

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API.

Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd. Generalizing SQL to stream processing and streaming analytics use cases poses a set of challenges: we have to tackle expressing infinite streams and timeliness of records. 

All is happening as Feasel’s Law foretold.

Comments closed

TF-IDF using Spark .NET

Ed Elliott shows how you can use the Spark .NET library to perform machine learning in Apache Spark:

Native spark has two API’s for creating your ML applications. The historical one is Spark.MLLib and the newer API is Spark.ML. A little bit like how there was the old RDD API which the DataFrame API superseded, Spark.ML supersedes Spark.MLLib.

At the end of last year, .NET for Apache Spark had no support for either the Spark.ML or Spark.MLLib API’s so I started implementing Spark.ML. In a similar way that .NET for Apache Spark supports the DataFrame API and not the RDD API, I started with Spark.ML and I believe that having the full Spark ML API will be enough for anyone.

It’s awesome that Ed is helping to move Spark .NET forward in this way.

Comments closed

Licensing for SQL Server Reporting Services

Denny Cherry explains licensing scenarios for SQL Server Reporting Services:

When you license SQL Server (of which the SQL Server Reporting Services engine is a part of) you license what is called the OSE or Operating System Environment. This is basically the OS that has SQL Server installed on it. Now, this can be the virtualization host (VMware or Hyper-V) or it can be the Windows Server (SSRS isn’t available on Linux, so we don’t have to deal with that, but if SSRS was available on Linux the rules would be the same as Windows). You can install SQL Server (or SSRS) as many times inside that OSE as you want to, but you can’t install SQL Server (or SSRS) on any other machines.

SSRS licensing isn’t too difficult to understand, relatively speaking.

Comments closed

Derived Table Nesting and Performance

Itzik Ben-Gan digs into some of the performance considerations around nested derived tables:

Unnesting/substitution of table expressions is a process of taking a query that involves nesting of table expressions, and as if substituting it with a query where the nested logic is eliminated. I should stress that in practice, there’s no actual process in which SQL Server converts the original query string with the nested logic to a new query string without the nesting. What actually happens is that the query parsing process produces an initial tree of logical operators closely reflecting the original query. Then, SQL Server applies transformations to this query tree, eliminating some of the unnecessary steps, collapsing multiple steps into fewer steps, and moving operators around. In its transformations, as long as certain conditions are met, SQL Server can shift things around across what were originally table expression boundaries—sometimes effectively as if eliminating the nested units. All of this in attempt to find an optimal plan.

In this article I cover both cases where such unnesting takes place, as well as unnesting inhibitors. That is, when you use certain query elements it prevent SQL Server from being able to move logical operators in the query tree, forcing it to process the operators based on the boundaries of the table expressions used in the original query.

That’s on my list for a second reading.

Comments closed