Curated SQL – Page 1136 – A Fine Slice Of SQL Server

Continuing The Advent Of Code In T-SQL

Published 2018-12-06 by Kevin Feasel

Wayne Sheffield has a few more posts in the Advent of Code series. His latest edition:

In Day 5, we find ourselves working with the polymers of the new Santa suit. A polymer (the input file), consists of units, represented by upper and lower case letters. Adjacent units of the same letter, but of different polarity (case), cancel each other out. This may lead to other units that can then cancel each other out. The goal is to reduce the polymer to as small as possible, and report back the reduced size.

Tasks:

Perform a case-sensitive search/replace for each letter of the alphabet. The search is for a pair of the same letter, where one is upper case, and the other is lower case.

Recursively perform this operation until the string can no longer be reduced.

In my opinion, the key part to this is that the operation needs to be performed recursively. I can think of only two ways to recursively perform an operation in SQL Server:

A recursive common table expression (cte).

Using a WHILE loop.

I don’t like using either of these mechanisms in SQL Server – they both perform operations in a “Row-By-Agonizing-Row” method, instead of a more set-based approach. However, set-based recursion usually performs extremely poorly. So, I’m going to use a while loop.

The recursion requirement does limit things a bit; otherwise I could see putting something together with the LEAD() window function.

Comments closed

Cross-Availability Group Login Management

Published 2018-12-06 by Kevin Feasel

David Fowler walks us through a problem about orphaned users and Availability Groups:

Now, I’m pretty sure that most of us will have been in the position where, after a fail-over we get inundated with calls, emails, Skype messages and carrier pigeon drops letting us know that so and so can no longer access the database.

When you look into it, you either find that the login never existed in the first place, so you create it or that it was there but the database user has become orphaned from it (happens when the login SID doesn’t match the SID of the database user, Adrian wrote about orphaned users in Dude where’s my access?).

You remap the orphaned user and everything is good again… that is until the next time you failover and once again you’ll be hit with the same orphaned user problem.

Click through for the explanation and a permanent fix for this issue.

Comments closed

Connecting Power BI To Dockerized SQL Server

Published 2018-12-06 by Kevin Feasel

Chris Taylor shows us how to build a SQL Server on Linux Docker container and use it to supply data to a Power BI dashboard:

I (and many others) have done a series of docker blog posts over the last couple of years but they’ve all tended to evolve around spinning up a SQL Server 2017+ container for testing or demo purposes. This is only really the start, think of the bigger picture here, once you have your database environment the world is your oyster.

This blog post will show how we can use SQL Server 2019 CTP2.1 running on Linux (Ubuntu) in a docker container as our data source for a Power BI environment in next to no time!

These steps show a very manual process for completing this setup, if it is something you are looking to do frequently then I suggest creating a Dockerfile and/or yml file and use docker-compose. This way you can have all your setup in one file and it will be a single statement to get your SQL Server 2019 environment up and running.

Read on for the demo.

Comments closed

Working With Missing Values In R

Published 2018-12-05 by Kevin Feasel

Anisa Dhana has a few examples of ways we can work with data containing missing values in R:

Imputation is a complex process that requires a good knowledge of your data. For example, it is crucial to know whether the missing is at random or not before you impute the data. I have read a nice tutorial which visualize the missing data and help to understand the type of missing, and another post showing how to impute the data with MICE package.

In this short post, I will focus on management of the missing data using the tidyverse package. Specifically, I will show how to manage missings in the long data format (i.e., more than one observation for id).

Anisa shows a few different techniques, depending upon what you need to do with the data. I’d caution about using mean in the second example and instead typically prefer median, as replacing missing values with the median won’t alter the distribution in the way that it can with mean.

Comments closed

Configuring Kafka Streams For Least Privilege

Published 2018-12-05 by Kevin Feasel

Gwen Shapira explains how we can assign minimal rights to Kafka Streams and KSQL users:

The principle of least privilege dictates that each user and application will have the minimal privileges required to do their job. When applied to Apache Kafka^® and its Streams API, it usually means that each team and application will have read and write access only to a selected few relevant topics.

Organizations need to balance developer velocity and security, which means that each organization will likely have their own requirements and best practices for access control.

There are two simple patterns you can use to easily configure the right privileges for any Kafka Streams application—one provides tighter security, and the other is for more agile organizations. First, we’ll start with a bit of background on why configuring proper privileges for Kafka Streams applications was challenging in the past.

Read the whole thing; “granting everybody all rights” generally isn’t a good idea, no matter what your data platform of choice may be.

Comments closed

Working With Key-Value Pairs In Spark

Published 2018-12-05 by Kevin Feasel

Teena Vashist shows us a few of the functions available with Spark for working with key-value pairs:

1. Creating Key/Value Pair RDD:
The pair RDD arranges the data of a row into two parts. The first part is the Key and the second part is the Value. In the below example, I used a parallelize method to create a RDD, and then I used the length method to create a Pair RDD. The key is the length of the each word and the value is the word itself.

scala> val rdd = sc.parallelize(List("hello","world","good","morning"))

rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> val pairRdd = rdd.map(a => (a.length,a))

pairRdd: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[1] at map at <console>:26

scala> pairRdd.collect().foreach(println)

(5,hello)

(5,world)

(4,good)

(7,morning)

Click through for more operations. Spark is a bit less KV-centric than classic MapReduce jobs, but there are still plenty of places where you want to use them.

Comments closed

Getting Maintenance Plan Information From Powershell

Published 2018-12-05 by Kevin Feasel

Shane O’Neill gives us the low-down on what we need to do in order to retrieve maintenance plan information from SQL Server using Powershell:

It’s surprisingly difficult to get this information in SQL Server. In fact I was quite stuck trying to figure out how to get this information when I realized that the good people over at Brent Ozar Unlimited already do some checking on this for their sp_Blitz tool.

A quick look at the above code showed me that dbo.sysssispackages was what I was looking for. Combine this with:

1. Some hack-y SQL for the frequency in human readable format, and

2. Some even more hack-y SQL to join on the SQL Agent Job name

And we had pretty much the XML for the Maintenance Plan and the SQL Agent Job Schedule that they were against.

Shane has made the code available as well, so check it out if you have any maintenance plans you’re trying to understand and maybe even get away from.

Comments closed

Gaining Business Understanding Through Paying Attention

Published 2018-12-05 by Kevin Feasel

Laura Ellis lays down some good tips for understanding business problems:

I know this sounds somewhat silly. But, when thinking through the steps that I take to solve a business problem, I realized that I do employ a strategy. The backbone of that strategy is based on the principals of solving a word problem. Yes, that’s right. Does anyone else remember staring at those first complex word problems as a kid and not quite knowing where to start? I do! However, when my teacher provided us with strategies to break down the problem into less intimidating, actionable steps, everything became rather doable. The steps: circle the question, highlight the important information and cross out unnecessary information. Do these steps and all of a sudden the problem is simplified and much less scary. What a relief! By employing the same basic strategy, we too can feel that sense of calm when working on a business problem.

It sounds blase but paying attention to what people are saying (or writing) versus hearing a few words and assuming the rest.

Comments closed

SQL Managed Instance Business Critical Tier Now Available

Published 2018-12-05 by Kevin Feasel

Jovan Popovic announces Azure SQL Managed Instance Business Critical tier has reached GA:

We are happy to announce General availability of Business Critical tier in Azure SQL Managed Instance – architectural model built for high-performance and IO demanding databases.

After 5 months of public preview period Azure SQL Managed Instance Business Critical Service tier is generally available.

Azure SQL Managed Instance Business Critical tier is built for high performance databases and applications that require low IO latency of 1-2ms in average with up to 100K IOPS that can be achieved using fast local SSD that this tier uses to place database files.

Click through to see what Business Critical tier in particular has to offer.

Comments closed

The Power Of Dual Storage Mode For Power BI Aggregations

Published 2018-12-05 by Kevin Feasel

Reza Rad continues a series on Power BI aggregations by explaining how using the Dual storage mode can make queries faster if you use both Import and DirectQuery sources:

This is not what we actually expect to see. The whole purpose of Sales Agg table is to speed up the process from DirectQuery mode, but we are still querying the DimDate from the database. So, what is the solution? Do we change the storage mode of DimDate to Import? If we do that, then what about the connection between DimDate and FactInternetSales? We want that connection to work as DirectQuery of course.

Now that you learned about the challenge, is a good time to talk about the third storage mode; Dual.

Read on for an example-filled tutorial.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts