Kevin Feasel – Page 879

In one of my most popular posts So, what is AI really? I showed that Artificial Intelligence (AI) basically boils down to autonomously learned rules, i.e. conditional statements or simply, conditionals.
In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing, so read on!

The nuanced answer here is that with classifiers, accuracy is not in itself a great measure in the case of class imbalance. The more balanced your classes are, the more likely it is that a model with high accuracy is a good model. That’s where other measures such as specificity and sensitivity, positive & negative predictive value, etc. come into play.

Comments closed

Performance and T-SQL’s CHOOSE

Published 2020-04-28 by Kevin Feasel

Grant Frichey answers a question:

Questions absolutely drive my blog content and I really liked this one: how does the T-SQL CHOOSE command affect performance.
On the face of it, I honestly don’t think it will affect performance at all, depending on where and how you use it. However, the answer is always best supplied by testing.

Grant’s post ended up being much more interesting than I had anticipated—my anticipated answer was “It’s a CASE statement so it behaves like a CASE statement.” But there is some nuance that I’ve left out.

Comments closed

Accessing Managed Instances from SSMS

Published 2020-04-28 by Kevin Feasel

James Serra shows us what we need to do in order to reach an Azure SQL Managed Instance from SQL Server Management Studio:

It used to be that the only way to use SQL Server Management Studio (SSMS) against Azure SQL Database Managed Instance (SQLMI) was to create a VM on the same VNET as SQLMI and use SSMS on that VM. That VM was usually called a jumpbox (see instructions here).
But about a year ago Microsoft added a way to use SSMS without using a VNET (announcement) by allowing you to enable a public endpoint for your SQLMI. This made it easy for me to access a SQLMI database on my laptop.

That change enables what James shows us.

Comments closed

Case-Insensitive Power Query Merges

Published 2020-04-28 by Kevin Feasel

Ed Hansberry takes us through a pain point in Power Query:

Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything.

Click through for your two options, plus a bonus option which might work.

Comments closed

Azure SQL Database Tiers

Published 2020-04-28 by Kevin Feasel

Tim Radney enumerates the tiers available to us with Azure SQL Database:

When Azure SQL Database first launched, there was a single pricing option known as “DTUs” or Database Transaction Units. (Andy Mallon, @AMtwo, explains DTUs in “What the heck is a DTU?“) The DTU model provides three tiers of service, basic, standard, and premium. The basic tier provides up to 5 DTUs with standard storage. The standard tier supports from 10 up to 3000 DTUs with standard storage and the premium tier supports 125 up to 4000 DTUs with premium storage, which is orders of magnitude faster than standard storage.

But there have been several additions since then and Tim lays it out for us.

Comments closed

The VALUES Operator in T-SQL

Published 2020-04-28 by Kevin Feasel

Chris Johnson shows off using the VALUES operator to construct records:

Values blocks are a really useful little bit of code in SQL Server. Basically, they are a block of defined values that you can use pretty much like any other data set.

It’s a nice way quickly to build a derived table. As a bonus, I’ll tie in an older post of mine around using APPLY with VALUES to unpivot data sets.

Comments closed

Scheduling SSIS Packages in Azure

Published 2020-04-28 by Kevin Feasel

Magi Naumova takes us through the process of running SSIS in Azure Data Factory, including the scheduling of jobs to run our SSIS packages:

The main purpose of these tools is to force the Lift and Shift approach of migrating and running existing SSIS Packages in Azure. I wouldn’t say that this is the most effective approach of transferring the ETL to Azure, but it could be a good start on a road of a Modern Azure Datawarehouse Architecture. If you have already deployed SSIS packages in Azure SSIS Catalog, then SSMS 18 helps you to put them on schedule very quickly.
Running SSIS Packages in Azure requires provisioning of SSIS Runtime Engine, an Azure Data Factory instance and a SQL Database which hosts the SSIS catalog. Scheduling SSIS Packages in Azure requires creating a data flow pipeline in ADF which has a trigger defined for scheduled execution. While describing all those concepts is far above the scope of this chapter, a short description would be useful.

Read on for a good amount of detail and a demo which walks through the process.

Comments closed

Splitting and Concatenating Strings with Python

Published 2020-04-27 by Kevin Feasel

Rajendra Gupta walks us through splitting and concatenating strings with Python:

In Python, we do not have a character data type. It uses Unicode characters for the string. It also considers a single character as a string. Sometimes, we need to split a string based on the separator defined. It is similar to a text to columns feature in Microsoft Excel.

Click through for a number of examples.

Comments closed

Useful R Packages for Data Scientists

Published 2020-04-27 by Kevin Feasel

Tomaz Kastrun has a nice collection of useful R packages for data scientists:

Among thousand of R packages available on CRAN (with all the mirror sites) or Github and any developer’s repository.
Many useful functions are available in many different R packages, many of the same functionalities also in different packages, so it all boils down to user preferences and work, that one decides to use particular package. From the perspective of a statistician and data scientist, I will cover the essential and major packages in sections. And by no means, this is not a definite list, and only a personal preference.

Click through for Tomaz’s recommendations.

Comments closed

Workload Isolation in Azure Synapse Analytics

Published 2020-04-27 by Kevin Feasel

Niko Neugebauer explains how resource governance works with Azure Synapse Analytics SQL Pools:

Carrying on with the Azure Synapse series on the workload identification, classification and isolation started with
Query Identification in Azure SQL DW (Synapse Analytics), in this post I wanted to focus on the workload groups and the workload isolation (aka Resource Governance).
Before advancing and looking into Azure Synapse Analytics “Resource Governor” (my own naming, my fault – and yeah, I shall keep it naming properly), we need to look at the resource classes in Azure Synapse Analytics.
But even before that et me start with WTH – Where is the Heck of Resource Governance in Azure SQL Database ? (Don’t throw at me those Managed Instances, which is a SQL Server with Availability Group running in tuned VM in the background – I want & need the Azure SQL Database to have the proper Resource Governance.

Click through for an explanation plus demonstration.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

The Siren Song of High Accuracy

Performance and T-SQL’s CHOOSE

Accessing Managed Instances from SSMS

Case-Insensitive Power Query Merges

Azure SQL Database Tiers

The VALUES Operator in T-SQL

Scheduling SSIS Packages in Azure

Splitting and Concatenating Strings with Python

Useful R Packages for Data Scientists

Workload Isolation in Azure Synapse Analytics