Month: March 2019

Performance Testing Simple Scalar UDF Functions

Wayne Sheffield tries out a simple scalar UDF in SQL Server 2019 to see how it performs:

I recently published a post detailing the new Scalar UDF Inlining feature in SQL 2019 here. That post introduced the new feature in a way that I used to compare performance to the other function types, continuing the performance evaluation of functions that I had previously posted here and here. In the Scalar UDF Inlining post, I used a function to strip all non-numeral values from a string, and to return the result. This used the FOR XML output option.

In thinking about how scalar functions are commonly used, I’ve decided to revisit this feature with a simpler function. I will still compare it to all the other types of functions to see how Scalar UDF Inlining compares to the others.

Wayne’s results are music to the product team’s ears, I’m sure.

SqlServer Module Now with Invoke-Sqlcmd

Aaron Nelson alerts us to a new preview of the SqlServer Powershell module:

If you’re still on an earlier version of PSCore and are unable to install PSCore 6.2 right now, you can still download preview of the SqlServer module to get the latest fixes and new features.  You just won’t be able to use the Invoke-Sqlcmd cmdlet.

Another quick thing to note is that this is like a v.0.0.1 of Invoke-SqlCmd on PSCore; it does not have all the bells & whistles of the version of Invoke-Sqlcmd for [full blown] Windows PowerShell.  Obviously, more features will be added over time, but the basic functionality was ready to for customers to start “kicking the tires”.

Read on for more notes and the link to check this all out.

Comparing STRING_AGG Performance Against FOR XML PATH

Jeffry Schwartz compares the performance of STRING_AGG in SQL Server 2017 versus the tried and true method of FOR XML PATH for string concatenation:

Two different T-SQL methods were used for testing.  The FOR XML PATH solution incorporates the reverse and stuff functions along with FOR XML PATH.  The String_agg solution is much simpler, but is only available on SQL Server 2017 currently.

The FOR XML PATH method has been documented thoroughly over the years in many articles, so these explanations will not be repeated here.  However, two points are worth mentioning.  The first one is that the @strSeparator (in this case ‘ or  ’) string will reside at the end of the constructed string (in this case @strFilter).  Since the last occurrence of @strSeparator must be removed to prevent a syntax error, characters must be dropped from the end of the string.  Although it may seem obvious that four characters (the length of @strSeparator) should be removed, actually only three need be removed because the end of the constructed string is “r” not “ “.  Therefore, the length argument of the left function specification is the length of @strFilter minus three.

I like STRING_AGG a lot, primarily because I can remember it and I can explain it to people relatively new to SQL Server. Read on to see how they both compare in terms of performance.

Showing Totals on Power BI Stacked Column Charts

Reza Rad shows us how to add a totals figure to Power BI stacked column charts:

As you can see, there are data labels for each subcategory (means gender and education), but no data label showing the total of each education category. for example, we want to know how much was the total sales in the High School category. Now that you know the problem, let’s see a way to fix it.

Read on for Reza’s solution to the problem. In general, if people might care about the total, do them a favor and show the total.

Accidentally Building a Population Graph

Neil Saunders shares an example of a newspaper headline which ultimately just shows us population sizes:

Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is serves as a proxy variable for “station busyness”.

When working with spatial data cases, it’s important to differentiate an effect you see because it’s actually unique or interesting versus an effect you see because that’s where all of the people are.

Aspect-Based Sentiment Analysis

Federico Pascual explains aspect-based sentiment analysis and then shows how to implement it with MonkeyLearn:

Imagine you have a large dataset of customer feedback from different sources such as NPS, satisfaction surveys, social media, and online reviews. Some positive, some negative and others that contain mixed feelings. You’d use sentiment analysis to classify the polarity of each text, right? After all, it’s already proven to be a highly efficient tool.

But, what if you wanted to pick customer feedback apart, hone in on the details, get down to the nitty-gritty of each review for a more accurate analysis of your customers’ opinions?

Cue aspect-based sentiment analysis (ABSA). A text analysis technique that breaks down text into aspects (attributes or components of a product or service) and allocates each one a sentiment level. This technique can help businesses become customer-centric, which means putting their customers at the heart of everything they do. It’s about listening to their customers, understanding their voice, analyzing their feedback and learning more about customer experiences, as well as their expectations for products or services.

Click through for the demo.

Azure Data Studio and T-SQL Notebooks

Constantine Kokkinos takes us through the preview of T-SQL notebooks in Azure Data Studio:

I have been waiting for word about the new Notebook functionality in Azure Data Studio, and when I heard it was available in the insider build, I jumped in to take a look.

Jupyter Notebook is a web application that allows you to host programming languages, run code (often with different programming languages), return results, annotate your data, and importantly, share the source controlled results with your colleagues.

This is an exciting addition; SQL is a great language to combine with notebooks given the exploratory nature of the language. I’m going to wait until it’s officially out before diving too far into it, though.

Querying Cosmos DB From SQL Server

Jovan Popovic shows how you can use the Cosmos DB ODBC driver to perform OPENROWSET queries against Cosmos DB:

Now you need to install ODBC Driver for CosmosDB on the computer where you have SQL Server installed. I’m using Microsoft Azure Cosmos DB ODBC 64-bit.msi for 64-bit Windows – 64-bit versions of Windows 8.1 or later, Windows 8, Windows 7, Windows Server 2012 R2, Windows Server 2012, and Windows Server 2008 R2.

Once you install this driver, you should setup ODBC source in system DSN and test the connection:

If you’re running SQL Server 2019, you can follow Jovan’s first two steps and then create an external data source and table with PolyBase to get to the same results.

Updating SQL Server on Linux Docker Images

Max Trinidad shows us how you can make a change to the default SQL Server container and save it for your own purposes:

The “docker commit …” command, you’ll provide both the image-name (all lowercase) and a TAG name (uppercase allowed). You can be creative in having an naming conversion for you images repositories.

It’s very important to save images after doing the commit. I found out that having an active container would be useless without an image.  As far as I know, I haven’t found a way to rebuild an image from an existing container if the image was previously removed.

Max has a full demo, including installing various tools and programs as well as tips on how to minimize the pain.

Finding Three-Part or Four-Part Names in SQL Server

Louis Davidson shows how we can find three-part or four-part naming in T-SQL code:

In order to make this work, one of the considerations is to eliminate cross database dependencies, as you can’t reference objects that don’t exist in views, and even in stored procedures, which offer delayed resolution of objects, you can’t test the code without the database it is referencing.

In addition, and somewhat more important to the process, is dealing with three part names that reference the name of the database your object is in. During the comparison process, the database can be created with a name that is different from your target database to compare to (referred to as a shadow database.) So if you are in database X and have references to X.schema.table, but the database is generated as X_Shadow, the X. is now a cross database reference rather than the local reference you are desiring.

Four part names to linked servers are a different sort of nightmare, but one that is (hopefully) exceedingly rare. The queries presented will help with this as well.

Louis has a few scripts to help you find these. If your code is in source control already, you could also build a regular expression to search through it.

