Curated SQL – Page 1389 – A Fine Slice Of SQL Server

Unit testing helps us to write better code, make rapid changes to our code and has been generally seen as a good idea for about 10 years. Writing tests for T-SQL code is made much easier by using tSQLt but there is quite a high barrier to entry both in terms of the technical skills in getting tSQLt running and also how to approach large code bases of, sometimes, unfriendly T-SQL code and taming the code with unit tests.

I have successfully unit tested T-SQL code in a number of different environments including clean greenfield environments as well as legacy projects and I have written this course to help people get started with unit testing but also help them to turn unit testing into a part of their development process that they can use everyday to improve the quality of their work and the speed at which deployments can be made.

Click through to sign up.

Comments closed

Protecting Stored Procedures Against SQL Injection

Published 2017-09-05 by Kevin Feasel

Bert Wagner has a two-part series on SQL injection. In the first post, he shows how to use sp_executesql to parameterize queries:

The important thing to note in the query above is that we are generating a dynamic SQL statement; that is, we are building the SQL query string, and then we are executing it.

Imagine this stored procedure is running in order to display a “Welcome <Full Name>!” message in our app — a website visitor types in their@ParmUserName and we execute the stored procedure to return their full name.

In his second post, Bert shows what to do if you need to run a query off of a dynamically-selected table:

Unfortunately we have to fall back on SQL’s EXEC command.

However, like we discussed last week, we need to be vigilant about what kind of user input we allow to be built as part of our query.

Assuming our app layer is already sanitizing as much of the user input as possible, here are some precautions we can take on the SQL side of the equation:

Read on for more.

Comments closed

Get Security Update List In Powershell

Published 2017-09-05 by Kevin Feasel

Jana Sattainathan builds a detailed CSV of Microsoft monthly security updates using Powershell:

Once I understood the data well, I realized that the raw data had to be flattened out to expand collections (like KB) at the row level into their own row so that everything has a single value in each row. Then, the grouping is easy.

It made more sense to allow grouping not just by KB but by other columns like Product or CVE. The Group-Object works fine for most cases but since there will be duplicates after the data is grouped, it makes it easier to just do it with HashTables.

Jana provides the entire solution on his site. When reading it, I felt the urge to switch to a language which offers easier pivoting and aggregation, but the code was clear and understandable.

Comments closed

Gigantic Row Custom U-SQL Extractor

Published 2017-09-04 by Kevin Feasel

Phillip Seamark has created a custom U-SQL extractor which handles rows larger than 4 MB:

It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true parameter didn’t solve the issue.

I dug a bit deeper and found rows in some of the files that are long – really long. One in particular was 47MB long just for the row and this was valid data. I could have manually edited these outs by hand but thought I’d see if I could solve another way.

After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.

Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.

Comments closed

Counting Rows In Spark With Dplyr

Published 2017-09-04 by Kevin Feasel

John Mount discusses the difficulty of using dplyr to count rows in Spark:

That doesn’t work (apparently by choice!). And I find myself in the odd position of having to defend expecting nrow() to return the number of rows.

There are a number of common legitimate uses of nrow() in user code and package code including:

Checking if a table is empty.
Checking the relative sizes of tables to re-order or optimize complicated joins (something our join planner might add one day).
Confirming data size is the same as reported in other sources (Spark, database, and so on).
Reporting amount of work performed or rows-per-second processed.

Read the whole thing; this seems unnecessarily complicated.

Comments closed

The Power Of Out-GridView

Published 2017-09-04 by Kevin Feasel

Mike F. Robbins shows off Out-GridView in the context of working with Azure:

Although the Out-GridView cmdlet existed in PowerShell version 2.0, you’ll need PowerShell version 3.0 to use it as shown in this blog article. The OutputMode parameter of Out-GridView which is used in this blog article was added in PowerShell version 3.0. Also, Out-GridView can only be used on operating systems with a GUI (it cannot be used on server core).

As far as I know, there’s no way to set a default region in Azure like there is with the AWS Initialize-AWSDefaultConfiguration cmdlet. I guess if you really wanted to set a default, you could always use $PSDefaultParameterValues to set a default value for the Location parameter for the cmdlets that require it.

Out-GridView is great for what it is: an easy UI within Powershell, with sorting and filtering built in.

Comments closed

Creating Minidumps In SQL Server

Published 2017-09-04 by Kevin Feasel

Joe Obbish shows how to read the call stack by creating a minidump:

We can use minidumps to generate small files that contain SQL Server call stacks. Note that if you aren’t careful you can end up writing the full contents of memory to a dump file. I believe that writing the file is a single-threaded process, so this can take hours and hundreds of GBs on a real server. Also SQL Server is frozen while the dump is happening, so don’t do it in production.

Dumps are most useful when you want to examine a long running, very simple query or a query that seems to be stuck at a certain point. They aren’t useful to see a full call stack of a non-simple query or to see something that happens just once during execution, like a specific task during query compilation. All that you can get is a snapshot and the snapshot may not contain the information that you’re looking for.

There are multiple ways to generate and read dump files. For this blog post I’ll be using sqldumper.exe and WinDbg.

Click through to read the whole thing.

Comments closed

Working With AT TIME ZONE In SQL Server

Published 2017-09-04 by Kevin Feasel

Louis Davidson has a post up showing how to use AT TIME ZONE:

I have a series of blogs about temporal tables in SQL Server 2016, which you can see here. However, one thing I never had mentioned was that the Temporal Row Start and End Times are always stored in UTC times (which is generally inconvenient to work with unless you are from the Great Britain area). However, when you are looking at your data and want to see what the data looked like at ‘Yesterday at 12:00 at the computer you are working from, this can be a bit more work.

In 2016 SQL Server, a new feature was added that totally passed me by. AT TIME ZONE. It is used to manipulate the time zone offset of a point in time value,either to append the time zone to a value, or to shift the time zone value of a DATETIMEOFFSET value. The most common practical use that I will show is converting a value in your local time to UTC (and vice versa).

This is something I tend to forget about, but can be useful when building quick reports from UTC. I’d store all data in UTC, just to make sure dates and times are consistent across servers in different locations, but you don’t have to do those calculations in your head all the time.

Comments closed

Join Elimination

Published 2017-09-04 by Kevin Feasel

Lukas Eder has a nice post explaining different forms of automatic join elimination:

We intended to fetch all customers and their addresses. But observe: We project only columns from the CUSTOMER table and we don’t have any predicates at all, specifically not predicates using the ADDRESS table. So, we’re completely ignoring any contributions from the ADDRESS table. We never really needed the JOIN in the first place!

And in fact, the optimiser can prove this too, because of the FOREIGN KEY constraint on C.ADDRESS_ID, which guarantees that every CUSTOMER record has exactly one corresponding ADDRESS record. The JOIN does not duplicate, nor remove any CUSTOMER rows, so it is unneeded and thus eliminated (by some, not all databases, will list each database at the end of the article).

So, the database can rewrite the SQL statement to the following, equivalent SQL statement in the presence of said FOREIGN KEY

Read on for a comparison across different products as well.

Comments closed

Text Featurizing With Microsoft R Server

Published 2017-09-01 by Kevin Feasel

David Smith has a post summarizing sentiment analysis with Microsoft R Server:

Tsuyoshi Matsuzaki demonstrates the process in a post at the MSDN Blog. The post explores the Multi-Domain Sentiment Dataset, a collection of product reviews from Amazon.com. The dataset includes reviews from 975,194 products on Amazon.com from a variety of domains, and for each product there is a text review and a star rating of 1, 2, 4, or 5. (There are no 3-star rated reviews in the data set.) Here’s one example, selected at random:

What a useful reference! I bought this book hoping to brush up on my French after a few years of absence, and found it to be indispensable. It’s great for quickly looking up grammatical rules and structures as well as vocabulary-building using the helpful vocabulary lists throughout the book. My personal favorite feature of this text is Part V, Idiomatic Usage. This section contains extensive lists of idioms, grouped by their root nouns or verbs. Memorizing one or two of these a day will do wonders for your confidence in French. This book is highly recommended either as a standalone text, or, preferably, as a supplement to a more traditional textbook. In either case, it will serve you well in your continuing education in the French language.

The review contains many positive terms (“useful”, “indespensable”, “highly recommended”), and in fact is associated with a 5-star rating for this book. The goal of the blog post was to find the terms most associated with positive (or negative) reviews. One way to do this is to use the featurizeText function in thje Microsoft ML package included with Microsoft R Client and Microsoft R Server. Among other things, this function can be used to extract ngrams (sequences of one, two, or more words) from arbitrary text. In this example, we extract all of the one and two-word sequences represented at least 500 times in the reviews. Then, to assess which have the most impact on ratings, we use their presence or absence as predictors in a linear model:

If you’re thinking about sentiment analysis, read the whole thing.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts