Press "Enter" to skip to content

Curated SQL Posts

SSIS Framework File Community Edition

Andy Leonard has an announcement:

The very first data integration / data engineering framework I ever wrote was for Data Transformation Services, or DTS. The DTS framework had one job: manage connections. I don’t recall all the details, but I remember DTS included a task that allowed packages to retrieve settings from INI files. INI files are key-value files, so I simply added entries with identical keys and different values – values that matched connection strings for each lifecycle tier – and placed each version of the INI file in the same location on every server in the lifecycle.

The next framework I wrote was for SSIS. I stored metadata in tables – including connections metadata – and created a concept I called an SSIS Application. An SSIS application is, according to my definition, a “collection of SSIS packages that execute in a pre-determined order.”

The SSIS Framework File Community Edition is very similar to this first framework, except for the connections management.

Click through to learn more about the SSIS Framework File Community Edition and check it out.

Comments closed

Capturing SQL Server Audit Events with Azure Monitor

Bruno Gabrielli connects Azure Montor to SQL Server Audit:

Today I am going to cover an interesting aspect on how to capture security audit events from both Azure and non-Azure SQL Server machines. Most of you probably know that SQL Server is capable of auditing security related information, such as access to a given database, record creation or deletion, configuration change and so on) according to the Audit configuration applied to a given instance or database.

In this post, we will not dig into SQL Server Audit configuration or capability. We will rather explore the steps and configurations necessary to collect data using Azure Monitor.

Read on for the process. You will need the appropriate agent for this, but that agent doesn’t necessitate that your machine be in Azure.

Comments closed

Testing Failover Group and TCP Connectivity with Managed Instances

Niko Neugebauer has a pair of connectivity tests for us. First up is failover group connectivity:

When you set up a failover group between primary and secondary SQL Managed Instances in two different regions, each instance is isolated using an independent virtual network. Replication traffic needs to be allowed between these VNets.

To allow this kind of traffic, one of the prerequisites is:

– “You need to set up your Network Security Groups (NSG) such that ports 5022 and the range 11000-11999 are open inbound and outbound for connections from the subnet of the other managed instance. This is to allow replication traffic between the instances.”

Click through for a SQL Agent job script which helps with the test. Meanwhile, you can also test TCP connectivity from a managed instance:

In this post we shall focus on helping you determining the TCP connectivity from SQL Managed Instance against a given endpoint and port of your choice.

If you are interested in other posts on how-to discover different aspects of SQL MI – please visit the  http://aka.ms/sqlmi-howto, which serves as a placeholder for the series.

There are scenarios where it would be nice to be able to test if a SQL Managed Instance can reach some “external” endpoints, like Azure Storage as an example.

Check out both posts.

Comments closed

Remapping Database Columns in Python

John Mount performs mapping en masse:

The tricky part is: data science application scale easily has hundreds of string valued variables, each having hundreds of thousands of tracked values. The possibility of a large number of variable values or level renders the CASE/WHEN solution undesirable- as the query size is proportional to the number variables and values. The JOIN solutions build a query size proportional to the number of variables (again undesirable, but tolerable). However, super deeply nested queries are just not what relational databases expect. And a sequence of updates isn’t easy to support as a single query or view.

As an example of remapping, John shows translating “a” in a column to 1, “b” to 2, “d” to 3, etc.—that is, perhaps mapping each unique string to a unique number.

Comments closed

Training a Model in the Azure ML Designer

I continue a series on low-code machine learning in Azure ML:

Machine learning is a lot like an action film from the 1980s: we see early on that there’s a problem, we train in a cool montage with upbeat rock music, and then we come back to the problem and defeat it with car chases and bazookas and quippy one-liners. Well, maybe that simile got away from me a little bit, but I think I’ll stick with it.

What we’ll do in this post is cover the process of training a simple model using the Azure ML designer. I won’t deviate too far from the “classic” Azure ML script, which involves using the Designer to train a model and then deploy an endpoint for consumption. And away we go!

Sometimes, when a model is running, I say to it, “I have to remind you Sully, this is my weak arm!”

Comments closed

Handling Categorical Data in R

The RSquared Academy blog has a two-parter on handling categorical data in R. Part 1 elaborates on kinds of categorical data and introduces a case study:

While we can rank the categories, we cannot assign a value to them. For example, in satisfaction ranking, we cannot say that like is twice as positive as dislike i.e. we are unable to say how much they differ from each other. While the order or rank of data is meaningful, the difference between two pieces of data cannot be measured/determined or are meaningless. Ordinal data provide information about relative comparisons, but not the magnitude of the differences.

Part 2 shows off ways to work with categorical data in tables:

In this section, we will explore the above ways of summarizing categorical data. We will also spend some time learning about tables as you will be using them extensively while working with categorical data. R has many packages for tabulating data and we list and explore all of them in the R scripts shared in the GitHub repository.

Click through for both guides. H/T R-Bloggers.

Comments closed

Date Math in Powershell

Steve Jones adds 12 years in Powershell:

I saw a fun post on Twitter recently asking days until retirement. I wrote this code:

DECLARE @YearsToRetire INT = 11;
SELECT DATEDIFF (DAY, GETDATE (), (DATEADD (YEAR, @YearsToRetire, GETDATE ())));

I thought that wasn’t bad, but then I wondered, how would I do this in PowerShell? I knew there had to be a way, so I googled and ran into this article.

Normally I need to take off my shoes to add that many years.

Comments closed

Page Compression Success Rates

Paul Randal has a script for us:

Yesterday I was chatting with Jonathan about some of the internals of the page compression algorithms for a client discussion. When you enable page compression, rows are first just row compressed (storing values with empty bytes stripped out) until a page fills up. The Storage Engine then attempts page compression for the page. 

Click through to see what that entails and how you can see what percentage of pages successfully compress at the page level.

Comments closed

The Evolution of Cloud Architecture

Ben Brauer has a two-parter looking at how architecture is changing. Part 1 looks at containers and machine learning:

Let’s start describe containers at a high level.  A container is a packaging and distribution mechanism that abstracts and resolves many of the installer issues that result from ‘unique’ environments.  We’ve all heard developers exclaim “well, it works on my machine,” after pushing an application to a new environment only to realize its broken.  Containers strive to address this problem by creating a hard boundary between the infrastructure and the software stack used by an application. External dependencies are not necessarily added to the container, but all your internal dependencies (frameworks, runtimes, etc.) are there.  This makes the deployment of the application to a new environment significantly more predictable as the compute environment is consistent as its part of the container.

Part two looks at serverless compute and low-code/no-code development:

Low-code (or no-code) development for applications is not a new concept. It strives to democratize development in a similar way as decades ago Visual Basic expanded the number of developers from thousands of C++ developers to hundreds of thousands of developers creating Windows-based solutions. Low-code takes this concept to non-technical professionals. Although this notion is great for productivity and usability, the maintenance and performance of these apps can be daunting to say the least. Now non-technical application authors need to learn about application management, documentation and, application deployment.  Without a clear understanding of these considerations, the environment can quickly become chaotic.  The good news is that platforms and tools have come a long way since Visual Basic. For example, Microsoft’s Power Apps platform provides many of the platform services needed to maintain a healthy application lifecycle and governance paradigm.

These are good concepts to know about, regardless of your particular cloud platform.

Comments closed