March 2017 – Page 2 – Curated SQL

NOLOCK On CTEs

Published 2017-03-30 by Kevin Feasel

Erik Darling shows how the NOLOCK hint works with common table expressions:

So, for all you NOLOCKers out there, you can now save yourselves oodles of time by only using the hint in outer references to your CTEs and Views.
Congratulations, I suppose.
(Please stop using NOLOCK.)

Agreed, whenever possible.

Comments closed

Converting To Local Time In M

Published 2017-03-30 by Kevin Feasel

Chris Webb shows how to convert a datetime from UTC to your local time zone using M:

Here’s a brief explanation of what the query does:

First it reads the times from the Excel table and sets the Time column to be datetime data type
It then creates a new column called UTC and then takes the values in the Time column and converts them to datetimezone values, using the DateTime.AddZone() function to add a time zone offset of 0 hours, making them UTC times
Finally it creates a column called Local and converts the UTC times to my PC’s local time zone using the DateTimeZone.ToLocal() function

There are some limitations to what it does, so you can’t convert to just any time zone while still retaining Daylight Savings Time awareness.

Comments closed

Attribute Slicer

Published 2017-03-30 by Kevin Feasel

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Attribute Slicer Power BI Custom Visual. Using the Attribute Slicer you have the ability to filter your entire report while also being able to visibly see a measure value associated with each attribute.

Click through for the video as well as more details. This looks like a very interesting way of integrating a slicer with some important metric, like maybe including dollar amounts per sales region and then filtering by specific regions to show more detailed analyses.

Comments closed

Tuning Kafka And Spark Data Pipelines

Published 2017-03-29 by Kevin Feasel

Larry Murdock explains the tuning options available to Kafka and Spark Streams:

Kafka is not the Ferrari of messaging middleware, rather it is the salt flats rocket car. It is fast, but don’t expect to find an AUX jack for your iPhone. Everything is stripped down for speed.
Compared to other messaging middleware, the core is simpler and handles fewer features. It is a transaction log and its job is to take the message you sent asynchronously and write it to disk as soon as possible, returning an acknowledgement once it is committed via an optional callback. You can force a degree of synchronicity by chaining a get to the send call, but that is kind of cheating Kafka’s intention. It does not send it on to a receiver. It only does pub-sub. It does not handle back pressure for you.

I like this as a high-level overview of the different options available. Definitely gets a More Research Is Required tag, but this post helps you figure out where to go next.

Comments closed

Concurrency In Scala

Published 2017-03-29 by Kevin Feasel

Matthew Rathbone shows different concurrency options available in Scala:

Scala is a functional programming language that aims to avoid side effects by encouraging you to use immutable variables (called ‘values’), and data structures.
So by default in Scala when you build a list, array, string, or other object, that object is immutable and cannot be changed or updated.
This might seem unrelated, but think about a thread which has been given a list of strings to process, perhaps each string is a website that needs crawling.
In the Java model, this list might be updated by other threads at the same time (adding / removing websites), so you need to make sure you either have a thread-safe list, or you safeguard access to it with the protected keyword or a Mutex.
By default in Scala this list is immutable, so you can be sure that the list cannot be modified by other threads, because it cannot be modified at all.
While this does force you to program in different ways to work around the immutability, it does have the tremendous effect of simplifying thread-safety concerns. The value of this cannot be understated, it’s a huge burden to worry about thread safety all the time, but in Scala much of that burden goes away.

Read the whole thing if you’re looking at writing Spark applications in Scala. If you’re thinking about functional programming in .NET languages, F# is there for you.

Comments closed

Linear Support Vector Machines

Published 2017-03-29 by Kevin Feasel

Ananda Das explains how linear Support Vector Machines work in classifying spam messages:

Linear SVM assumes that the two classes are linearly separable that is a hyper-plane can separate out the two classes and the data points from the two classes do not get mixed up. Of course this is not an ideal assumption and how we will discuss it later how linear SVM works out the case of non-linear separability. But for a reader with some experience here I pose a question which is like this Linear SVM creates a discriminant function but so does LDA. Yet, both are different classifiers. Why ? (Hint: LDA is based on Bayes Theorem while Linear SVM is based on the concept of margin. In case of LDA, one has to make an assumption on the distribution of the data per class. For a newbie, please ignore the question. We will discuss this point in details in some other post.)

This is a pretty math-heavy post, so get your coffee first. h/t R-Bloggers.

Comments closed

SQL Client Aliases

Published 2017-03-29 by Kevin Feasel

Andrew Pruski explains how to use a lesser-known feature in SQL Server, client aliases:

One of the problems that we ran into when moving to using containers was how to get the applications to connect. Let me explain the situation.
The applications in our production environment use DNS CNAME aliases that reference the production SQL instance’s IP address. In our old QA environment, the applications and SQL instance lived on the same virtual server so the DNS aliases were overwritten by host file entries that would point to 127.0.0.1.
This caused us a problem when moving to containers as the containers were on a separate server listening on a custom tcp port. Port numbers cannot be specified in DNS aliases or host file entries and we couldn’t update the application string (one of the pre-requisites of the project) so we were pretty stuck until we realised that we could use SQL client aliases.

This is definitely a place that you’d want to document changes thoroughly, as my experience is that relatively few DBAs would even think of looking there.

Comments closed

Database File Sizes In Powershell

Published 2017-03-29 by Kevin Feasel

Rob Sewell has a nice post on checking database file sizes using dbatools in Powershell:

As always, PowerShell uses the permissions of the account running the sessions to connect to the SQL Server unless you provide a separate credential for SQL Authentication. If you need to connect with a different windows account you will need to hold Shift down and right click on the PowerShell icon and click run as a different user.
Lets get the information for a single database. The command has dynamic parameters which populate the database names to save you time and keystrokes

It’s a great post, save for the donut chart… Anyhow, this is recommended reading.

Comments closed

Azure SQL Database Premium RS

Published 2017-03-29 by Kevin Feasel

Arun Sirpal describes a new pricing tier for Azure SQL Database:

What Microsoft classifies as IO intensive I am not so sure, personally I have not seen any sort of IOPS figure(s) for what we could expect from each service tier, it’s not like I can just run DiskSpeed and find out. Maybe the underlying storage for Premium RS databases is more geared to work with complex analytical queries, unfortunately I do not have the funds in my Azure account to start playing around with tests for Premium vs. Premium RS (I would love to).
Also and just as important, Premium RS databases run with fewer redundant copies than Premium or Standard databases, so if you get a service failure you may need to recover your database from a backup with up to a 5-minute lag. If you can tolerate 5 minute data loss and you are happy with a reduced number of redundant copies of your database then this is a serious option for you because the price is very different.

It’s a lot less expensive (just under 1/3 the cost of Premium in Arun’s example), so it could be worth checking out.

Comments closed

Splitting A Small Database

Published 2017-03-29 by Kevin Feasel

Brent Ozar explains why he recommended a client break out a small database:

Listen, I can explain. Really.
We had a client with a 5GB database, and they wanted it to be highly available. The data powered their web site, and that site needed to be up and running in short order even if they lost the server – or an entire data center – or a region of servers.
The first challenge: they didn’t want to pay a lot for this ~~muffler~~ database. They didn’t have a full time DBA, and they only had licensing for a small SQL Server Standard Edition.

Read on for the full explanation. Given the constraints and expectations, it makes sense, and this is a good example of figuring out how expected future growth can change the bottom line for a DBA.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Month: March 2017