Press "Enter" to skip to content

Curated SQL Posts

TRY_CAST and TRY_PARSE

Joe Obbish shows the difference between two functions:

There’s a lot of guidance out there that states that TRY_CAST is a faster, more modern version of TRY_PARSE and that TRY_PARSE should only be used if you need to set the optional culture parameter. However, the two functions can return different results in some cases, even without the culture parameter.

That guidance is blatantly wrong. TRY_CAST() and TRY_PARSE() both came out in SQL Server 2012. TRY_PARSE() uses .NET to perform parsing, which is going to have some edge case differences, especially around cultures and localization. TRY_CAST() is CAST() in an error-safe wrapper. If anything, TRY_CAST() is the “old” version and TRY_PARSE() the “new” version, with scare quotes in place because they both came out at the same time.

Both of them are useful, though I do agree with Joe’s advice of avoiding TRY_PARSE(), at least for larger datasets. If you’re parsing a single date or a small table of dates, TRY_PARSE() does an excellent job because TRY_PARSE('13/01/2019' AS DATE USING 'fr-fr') is not something you can easily do with TRY_CAST() in a US locale.

Comments closed

SSIS Framework File Community Edition

Andy Leonard has an announcement:

The very first data integration / data engineering framework I ever wrote was for Data Transformation Services, or DTS. The DTS framework had one job: manage connections. I don’t recall all the details, but I remember DTS included a task that allowed packages to retrieve settings from INI files. INI files are key-value files, so I simply added entries with identical keys and different values – values that matched connection strings for each lifecycle tier – and placed each version of the INI file in the same location on every server in the lifecycle.

The next framework I wrote was for SSIS. I stored metadata in tables – including connections metadata – and created a concept I called an SSIS Application. An SSIS application is, according to my definition, a “collection of SSIS packages that execute in a pre-determined order.”

The SSIS Framework File Community Edition is very similar to this first framework, except for the connections management.

Click through to learn more about the SSIS Framework File Community Edition and check it out.

Comments closed

Capturing SQL Server Audit Events with Azure Monitor

Bruno Gabrielli connects Azure Montor to SQL Server Audit:

Today I am going to cover an interesting aspect on how to capture security audit events from both Azure and non-Azure SQL Server machines. Most of you probably know that SQL Server is capable of auditing security related information, such as access to a given database, record creation or deletion, configuration change and so on) according to the Audit configuration applied to a given instance or database.

In this post, we will not dig into SQL Server Audit configuration or capability. We will rather explore the steps and configurations necessary to collect data using Azure Monitor.

Read on for the process. You will need the appropriate agent for this, but that agent doesn’t necessitate that your machine be in Azure.

Comments closed

Testing Failover Group and TCP Connectivity with Managed Instances

Niko Neugebauer has a pair of connectivity tests for us. First up is failover group connectivity:

When you set up a failover group between primary and secondary SQL Managed Instances in two different regions, each instance is isolated using an independent virtual network. Replication traffic needs to be allowed between these VNets.

To allow this kind of traffic, one of the prerequisites is:

– “You need to set up your Network Security Groups (NSG) such that ports 5022 and the range 11000-11999 are open inbound and outbound for connections from the subnet of the other managed instance. This is to allow replication traffic between the instances.”

Click through for a SQL Agent job script which helps with the test. Meanwhile, you can also test TCP connectivity from a managed instance:

In this post we shall focus on helping you determining the TCP connectivity from SQL Managed Instance against a given endpoint and port of your choice.

If you are interested in other posts on how-to discover different aspects of SQL MI – please visit the  http://aka.ms/sqlmi-howto, which serves as a placeholder for the series.

There are scenarios where it would be nice to be able to test if a SQL Managed Instance can reach some “external” endpoints, like Azure Storage as an example.

Check out both posts.

Comments closed

Remapping Database Columns in Python

John Mount performs mapping en masse:

The tricky part is: data science application scale easily has hundreds of string valued variables, each having hundreds of thousands of tracked values. The possibility of a large number of variable values or level renders the CASE/WHEN solution undesirable- as the query size is proportional to the number variables and values. The JOIN solutions build a query size proportional to the number of variables (again undesirable, but tolerable). However, super deeply nested queries are just not what relational databases expect. And a sequence of updates isn’t easy to support as a single query or view.

As an example of remapping, John shows translating “a” in a column to 1, “b” to 2, “d” to 3, etc.—that is, perhaps mapping each unique string to a unique number.

Comments closed

Training a Model in the Azure ML Designer

I continue a series on low-code machine learning in Azure ML:

Machine learning is a lot like an action film from the 1980s: we see early on that there’s a problem, we train in a cool montage with upbeat rock music, and then we come back to the problem and defeat it with car chases and bazookas and quippy one-liners. Well, maybe that simile got away from me a little bit, but I think I’ll stick with it.

What we’ll do in this post is cover the process of training a simple model using the Azure ML designer. I won’t deviate too far from the “classic” Azure ML script, which involves using the Designer to train a model and then deploy an endpoint for consumption. And away we go!

Sometimes, when a model is running, I say to it, “I have to remind you Sully, this is my weak arm!”

Comments closed

Handling Categorical Data in R

The RSquared Academy blog has a two-parter on handling categorical data in R. Part 1 elaborates on kinds of categorical data and introduces a case study:

While we can rank the categories, we cannot assign a value to them. For example, in satisfaction ranking, we cannot say that like is twice as positive as dislike i.e. we are unable to say how much they differ from each other. While the order or rank of data is meaningful, the difference between two pieces of data cannot be measured/determined or are meaningless. Ordinal data provide information about relative comparisons, but not the magnitude of the differences.

Part 2 shows off ways to work with categorical data in tables:

In this section, we will explore the above ways of summarizing categorical data. We will also spend some time learning about tables as you will be using them extensively while working with categorical data. R has many packages for tabulating data and we list and explore all of them in the R scripts shared in the GitHub repository.

Click through for both guides. H/T R-Bloggers.

Comments closed

Date Math in Powershell

Steve Jones adds 12 years in Powershell:

I saw a fun post on Twitter recently asking days until retirement. I wrote this code:

DECLARE @YearsToRetire INT = 11;
SELECT DATEDIFF (DAY, GETDATE (), (DATEADD (YEAR, @YearsToRetire, GETDATE ())));

I thought that wasn’t bad, but then I wondered, how would I do this in PowerShell? I knew there had to be a way, so I googled and ran into this article.

Normally I need to take off my shoes to add that many years.

Comments closed

Page Compression Success Rates

Paul Randal has a script for us:

Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. 

Click through to see what that entails and how you can see what percentage of pages successfully compress at the page level.

Comments closed