Curated SQL – Page 1617 – A Fine Slice Of SQL Server

Collecting Wait Stats

Published 2016-08-26 by Kevin Feasel

Kendra Little on collecting wait stats as part of a baseline:

I do love wait stats!

If you listened to the performance tuning methodology I outlined in an earlier episode, you saw how important I think wait stats are for troubleshooting performance.

If you missed that episode, it’s called Lost in Performance Tuning. (I’ve got an outline of the discussion in the blog post, as always.)

I agree with Kendra’s advice that buying a vendor tool is the right choice here, whenever it’s possible. It’s fairly likely that you’ll spend more money creating (and maintaining) your own scaled-down version of a vendor tool than biting the bullet and paying for a packaged product.

Comments closed

Dynamic Index Generation

Published 2016-08-26 by Kevin Feasel

Brent Ozar generates 999 indexes:

The CHARACTER_MAXIMUM_LENGTH <> -1 OR IS NULL stuff is because I don’t want to try to index NVARCHAR(MAX) fields, but I do want to index NVARCHAR(50) fields.

The ORDER BY is because I wanted to get a wide variety of leading fields. If I just ordered by c1.COLUMN_NAME, I wouldn’t get all of the first fields covered in the first 999 indexes. (And I’m not even using a large table.)

Sometimes I think I’ve worked on systems which used this script to build indexes. But then I read the index names: “dta.” And it all makes sense.

Comments closed

Training Data With Azure ML

Published 2016-08-26 by Kevin Feasel

Koos van Strien discusses training data sets and cross-validating results:

When choosing a train and testset, you’ll implicitly introduce a new bias: it could be that the model you just trained predicts well for this particular testset, when trained for this particular trainset. To reduce this bias, you could “cross-validate” your results.

Cross-validation (often abbreviated as just “cv”) splits the dataset into n folds. Each fold is used once as a testset, using all other folds together as a training set. So in our pizza example with 100 records, with 5 folds we will have 5 test runs:

This isn’t Azure ML-specific, and is good reading.

Comments closed

Last Item In Each Group

Published 2016-08-25 by Kevin Feasel

Reza Rad shows how to get the last item in each group using Power Query:

Scenario that I want to solve as an example is this:

FactInternetSales has sales transaction information for each customer, by each product, each order date and some other information. We want to have a grouped table by customer, which has the number of sales transaction by each customer, total sales amount for that customer, the first and the last sales amount for that customer. First and last defined by the first and last order date for the transaction.

In T-SQL, this sounds like the job of window functions. In Power BI, we write M.

Comments closed

Using The OUTPUT Clause

Published 2016-08-25 by Kevin Feasel

Steve Jones looks at the OUTPUT clause:

I often see people struggling to use triggers for auditing, or having issues with building them to handle multi row updates. However, there’s another choice: the OUTPUT clause.

Not many people use this clause, but it’s a great way to access the virtual inserted and deleted tables in your code.

My favorite post on the topic is still Adam Machanic’s.

Comments closed

Confirming Checkpoints

Published 2016-08-25 by Kevin Feasel

Arun Sirpal shows how to log when checkpointing runs:

Via Configuration manager I enabled trace flags 3502 and 3605 – both needed to get the checkpoint information and write it to the error log.

I then shutdown the machine, on start-up I looked into the error log.

1

EXEC XP_READERRORLOG

Notice the ‘s’ in front of the spid<number>? Well that means the checkpoint was done via the automatic process; if you do a manual checkpoint it won’t see this letter.

I did not know that the “s” indicated that this was an automated process.

Comments closed

Generating Absurd Numbers Of Columns

Published 2016-08-25 by Kevin Feasel

Brent Ozar wants to generate SmallInt.Max columns:

Alright, so we’ve learned that I can’t return more than 65,535 columns, AND I can only use 4,096 elements in my SELECT. I can think of several workarounds there – 65,535 / 4096 = about 16, which means I could create a few tables or CTEs and do SELECT *’s from them, thereby returning all 65,535 columns with less than 4,096 things in my SELECT. But for now, we’ll just start with 4,096 things in my SELECT:

If you think you need 65K columns returned, I refer you to Swart’s Ten Percent Rule.

Comments closed

Auditing Within Power BI

Published 2016-08-25 by Kevin Feasel

Adam Saxton has a video on how to use Power BI Auditing:

In this video, I look at the Power BI Auditing feature that was made available a few weeks ago. I show how to turn it on and how to search. This can be helpful with understanding who is doing what within your organization.

You can read more about Power BI Auditing by checking out the official docs.

Auditing Power BI in your organization

Adding the ability to audit data access is important enough within regulated environments that this was probably a deal-killer until a few weeks ago.

Comments closed

Migrating Data To SQL Server Using Data Factory

Published 2016-08-25 by Kevin Feasel

Ginger Grant moves data from Azure Blob Storage into Azure SQL Database using Data Factory:

There are instances where data resides in Azure Blob Storage and the data is needed in a SQL database. For example, if one ran a Machine Learning experiment in Data Factory, the results would be stored in Azure Blob storage, and for analysis purposes, it may make a lot more sense to move the data to SQL database. Moving data around in Data Factory, means writing JSON. In this example we will be using an Azure SQL DB, but it is not essential that the data be stored in Azure. An on-premises SQL Server could also be used, as long as a gateway was added for the connection, the other steps would be the same. There are five different Data Factory elements required to move data from an Azure blob to a database: a pipeline for the data, a data set containing the definition for the blob, a linked service for the blob, a data set containing a definition for the SQL Data, and a linked service to connect to the SQL database.

There’s a lot of JSON ahead.

Comments closed

Calculating DTU

Published 2016-08-25 by Kevin Feasel

John Sterrett gives us a measure for calculating DTUs in Azure SQL Database:

The whole query is below. Right now, let’s just focus on the secret sauce. The secret sauce is how DTU percentage gets calculated. In a nutshell, the maximum of CPU, Data IO, Log Write Percent determine your DTU percentage. What does this mean to you? Your max consumer limits you. So, you can be using 1% of your IO but still be slowed down because CPU could be your max consumer resource.

That’s a rather interesting finding. I think the next step (which may be so context-dependent that it’s not possible to generalize) might be to figure out what various workloads do to the metrics and if there’s a way to predict with some reasonable accuracy the expected DTU load given an anticipated change in workload, rather than seeing the value spike and reacting to it later.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts

I do love wait stats!