Curated SQL – Page 1386 – A Fine Slice Of SQL Server

There have already been many posts/articles/books written about the subject of how CALCULATE and FILTER works, so I’m not going to repeat all that information here. Noteworthy resources (by “the Italians” of course):

Filter Arguments in CALCULATE

How CALCULATE works in DAX

From SQL to DAX: Filtering Data

In this blog post I’d rather discuss a performance issue I had to tackle at a client. There were quite a lot of measures of the following format:

1	CALCULATE(measureX,FILTER(tableY,columnZ = “expression”))

Click through for a couple iterations of this.

Comments closed

Truncation Versus Deletion

Published 2017-09-08 by Kevin Feasel

Richie Lee contrasts two methods of getting rid of data:

I’ve been using TRUNCATE TABLE to clear out some temporary tables in a database. It’s a very simple statement to run, but I never really knew why it was so much quicker than a delete statement. So let’s look at some facts:

The TRUNCATE TABLE statement is a DDL operation, whilst DELETE is a DML operation.
TRUNCATE Table is useful for emptying temporary tables, but leaving the structure for more data. To remove the table definition in addition to its data, use the DROP TABLE statement.

Read on for more details and a couple scripts to test out Richie’s statements.

Comments closed

I/O Latency And Performance Tuning

Published 2017-09-08 by Kevin Feasel

Andy Galbraith is starting a new toolbox series. His first post is an introduction and a look at drive latency:

You look at the numbers again, and now you find that disk latency, which had previously been fine, is now completely in the tank during the business day, showing that I/O delays are through the roof.

What happened?

This demonstrates the concept of shifting bottleneck – while CPU use was through the roof, the engine so bogged down that it couldn’t generate that much I/O, but once the CPU issue was resolved queries started moving through more quickly until the next choke point was met at the I/O limit. Odds are once you resolve the I/O situation, you would find a new bottleneck.

How do you ever defeat a bad guy that constantly moves around and frequently changes form?

Click through for some pointers on disk latency and trying to figure out when it becomes a problem.

Comments closed

Hurricane Tracking With Power BI

Published 2017-09-08 by Kevin Feasel

Chris Albrektson has updated his hurricane tracker to watch Irma:

Here’s a (SQL) to the Power BI Hurricane tracker that I did last year for Hurricane Matthew. It’s not 100% perfect but it gets the job done. It should update every couple of hours, enjoy. Stay safe my fellow Floridians.

Click here to see the live report

Check it out.

Comments closed

Rights And Roles In SQL Server

Published 2017-09-08 by Kevin Feasel

Slava Murygin walks us through rights assignment with roles:

Problem description:
1. Need to create a group/user “User1”, which has to have only CRUD (Create-Read-Update-Delete) permissions for data in schema called “Schema1”.
2. Need to create a group/user “User2”, which has to have similar permissions as “User1” and have to be able create Views/Procedures/Functions in schema called “Schema2”.
3. The group/user “User1” has to have Select/Execute permissions for all newly created objects in “Schema2”.

Solution: Create a special database role for group/user “User2”.

Read on for sample scripts, including some tests to ensure we don’t over-grant rights.

Comments closed

Messing With The SQL Agent Job System

Published 2017-09-08 by Kevin Feasel

Adrian Buckman shows what happens if you start fiddling with SQL Agent tables:

Some time ago I came across a strange issue where I found a number of duplicated SQL Agent jobs, the odd thing is SQL will not allow you to have more than one agent job with the same name – they need to be unique.

This got me scratching my head a little at first, so I started out with some basic checks of the msdb tables.

This is example #5008 of just how poor the SQL Agent database design is. Example #1 is the absurd date-time notation.

Comments closed

How The New York Times Uses Apache Kafka

Published 2017-09-07 by Kevin Feasel

Boerge Svingen gives us an architectural overview of how the New York Times uses Apache Kafka to link different services together:

These are all sources of what we call published content. This is content that has been written, edited, and that is considered ready for public consumption.

On the other side we have a wide range of services and applications that need access to this published content — there are search engines, personalization services, feed generators, as well as all the different front-end applications, like the website and the native apps. Whenever an asset is published, it should be made available to all these systems with very low latency — this is news, after all — and without data loss.

This article describes a new approach we developed to solving this problem, based on a log-based architecture powered by Apache Kafka^TM. We call it the Publishing Pipeline. The focus of the article will be on back-end systems. Specifically, we will cover how Kafka is used for storing all the articles ever published by The New York Times, and how Kafka and the Streams API is used to feed published content in real-time to the various applications and systems that make it available to our readers. The new architecture is summarized in the diagram below, and we will deep-dive into the architecture in the remainder of this article.

This is a nice write-up of a real-world use case for Kafka.

Comments closed

Nested Resampling In R

Published 2017-09-07 by Kevin Feasel

Max Kuhn describes how nested resampling works:

A common method for tuning models is grid search where a candidate set of tuning parameters is created. The full set of models for every combination of the tuning parameter grid and the resamples is created. Each time, the assessment data are used to measure performance and the average value is determined for each tuning parameter.

The potential problem is, once we pick the tuning parameter associated with the best performance, this value is usually quoted as the performance of the model. There is serious potential for optimization bias since we uses the same data to tune the model and quote performance. This can result in an optimistic estimate of performance.

Nested resampling does an additional layer of resampling that separates the tuning activities from the process used to estimate the efficacy of the model. An outer resampling scheme is used and, for every split in the outer resample, another full set of resampling splits are created on the original analysis set. For example, if 10-fold cross-validation is used on the outside and 5-fold cross-validation on the inside, a total of 500 models will be fit. The parameter tuning will be conducted 10 times and the best parameters are determined from the average of the 5 assessment sets.

Definitely worth the read. H/T R-Bloggers

Comments closed

Sentiment Analysis In Power BI

Published 2017-09-07 by Kevin Feasel

Chris Webb has a new Power BI custom data connector:

I’m pleased to announce that I’ve published my first Power BI custom data connector on GitHub here:

https://github.com/cwebbbi/PowerBITextAnalytics

Basically, it acts as a wrapper for the Microsoft Cognitive Services Text Analytics API and makes it extremely easy to do language detection, sentiment analysis and to extract key phrases from text when you are loading data into Power BI.

Read the whole thing, as Chris has a great demo of it.

Comments closed

Integrating Active Directory: Local And Azure

Published 2017-09-07 by Kevin Feasel

Shannon Lowder sets up an on-prem Active Directory domain and links it to Azure Active Directory:

You’ll need to plan out your domain before you begin. In my case, I already had my network configured to use 192.168.254.x. My Fiber router serves as my default gateway as well as my DHCP server and primary DNS server for my local network. My wireless access points, primary workstation, and printer are already set up for static IP addresses. I have already set aside a subnet of addresses for static servers. I also already own a domain name (toyboxcreations.net). Having all this set up before trying to install my domain controller help by saving time.

Shannon glosses over the local AD part, but once that’s set up, shows how to tie it in with Azure Active Directory.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts