Month: October 2017

SSIS 2017 Scale-Out

Published 2017-10-30 by Kevin Feasel

Wolfgang Strasser has started a series on the new scale-out functionality in SQL Server Integration Services 2017. First, his introduction:

In the past, SSIS package executions were only able to run on the server that hosted the Integration Services server itself. With the rising number and requirements of more and more package executions sometimes the resources on the server ran short. Addressing this resource shortage custom scale out functionality was implemented that allowed package executions to be transfered to other “worker” machines in order to distribute execution load. With SQL Server 2017, this functionality is built into an shipped with SSIS 2017.

Before I am diving deeper into SSIS Scale Out I would like to discuss some basic vocabulary in the field of scalability.

Then, he describes the scale-out architecture:

The master is managing the available workers and all the work that is requested for execution in the scale out topoloy.

The master manages a list of (active) workers
The master gets the instructions from clients
The master knows the current state of work (queued jobs, running jobs, finished jobs, ..)

If you’re familiar with other distributed computing systems, this follows a similar path.

Comments closed

Finding Objects Relating To A Schema

Published 2017-10-30 by Kevin Feasel

Jason Brimhall has a script to help you find which objects are tied to a particular schema:

I have run into this very issue where there are far too many objects in the schema to be able to drop one by one. Add to the problem that I am looking to do this via script. Due to the need to drop the schema and the (albeit self imposed) requirement of doing it via script, I came up with the following that will cover most cases that I have encountered.

Click through for the script.

Comments closed

Persisting Computed Columns

Published 2017-10-30 by Kevin Feasel

Greg Low describes persisted computed columns:

Each time the value from that column is queried, the calculation is performed so the result can be returned. This makes sense when the value is changing regularly and the value is queried infrequently.

However, according to my completely subjective statistics, most computed columns are queried much more than they are ever changed. So why work the value out each and every time?

One really nice thing about persisted computed columns is that you can then build non-clustered indexes using these columns. It’s a great way of pre-computing work that you need to do often but which would violate rules of database normalization.

Comments closed

Finding Blocking In SQL Server

Published 2017-10-30 by Kevin Feasel

Amy Herold has a script to help you find which query is blocking your important query:

It might look complicated but it is actually very simple – query sys.sysprocesses with a cross apply using the sql_handle to get the text of the query, and then an outer apply with the same query again but you are joining to the blocking spid so you can get the text for the query that is doing the blocking. Beyond that, you can filter on various columns and refine your output

Andy Mallon goes one step further and searches for the lead blocker:

When blocking goes bad, it can go really bad. Sometimes it’s because someone (usually, that someone is me) forgets to commit a transaction before going to lunch, and those open locks cause a bunch of blocking. Sometimes a data load runs at a strange time, or an unusual amount of data gets loaded, or a query gets a bad plan and starts running long, or… you get the idea. There are a bunch of reasons this can come up.

The hardest part is that sometimes big blocking chains build up. The session I forgot to commit blocks 5 session. Each of those block 5 sessions. Each of those block 5 sessions… Eventually, I have 8000 sessions waiting on me, and I’m off eating a kale & farro salad. Oops.

The moral of the story is, don’t eat kale and farro salads; that sounds like rabbit food.

Comments closed

Line Continuation In T-SQL

Published 2017-10-30 by Kevin Feasel

Solomon Rutzky shows how line continuation works with SQL Server:

While it is not widely used (at least I have never seen anyone besides myself use it), T-SQL does actually have a line-continuation character: \ (backslash). Placing a backslash at the end of a line within a string literal (or constant as the MSDN documentation refers to it) or binary string will ignore the newline after the backslash. For example:
PRINT N'Same
Line';
displays the following in the “Messages” tab:

Same
Line

But, add in the backslash (well, a space and then a backslash so that it looks right):
PRINT N'Same \
Line';
and now the following is displayed in the “Messages” tab:

Same Line

Read on for more details.

Comments closed

Columnstore Indexes And ML Services

Published 2017-10-30 by Kevin Feasel

Niko Neugebauer picks up on some changes that SQL Server 2017 Machine Learning Services can use with respect to columnstore indexes:

I expect not just a couple of rows to be sent over for the Machine Learning Services, but huge tables with million of rows, that also contain hundreds of columns, because this kind of tables are the basis for the Data Science and Machine Learning processes.
While of course we are focusing here on rather small part of the total process (just the IO between SQL Server relational Engine and the Machine Learning Services), where the analytical process itself can take hours, but the IO can still make a good difference in some cases.
I love this improvement, which is very under-the-hood, but it will help a couple of people to make a decision of migrating to the freshly released SQL Server 2017 instead of the SQL Server 2016.

I haven’t quite taken advantage of this yet (just moved to 2017 but still in 130 compatibility mode) but fingers crossed that I’ll see those improvements.

Comments closed

Automating Azure Data Lake Storage ACLs

Published 2017-10-27 by Kevin Feasel

Shannon Lowder shows how to automate Azure Data Lake Storage access control lists:

Now that you have these, you can use a for each loop to set your permissions.
foreach ($ACL in $ACLs) {
   write-host "Grant $useremail " $ACL[1] " access to " $ACL[0];
    Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1]
    Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1] -Default
}
Now, for each permission, we’ll set the ACL and the default. Why set both? Well, when folders are created under each of the target folders, you want to cascade those permissions down from parent to child, right? Well, that’s what the Default ACL controls. If you skip the second Set-AzureRMDataLakeStoreItemAclEntry, then new folders would not inherit the permissions of the containing folder and your users would be unable to access their files properly.

Read the whole thing. Shannon also has one of the very few valid use cases for 3D pie charts.

Comments closed

Dealing With Word Tensors

Published 2017-10-27 by Kevin Feasel

Chris Moody continues his series on natural language processing:

Counting and tensor decompositions are elegant and straightforward techniques. But these methods are grossly underepresented in business contexts. In this post we factorized an example made up of word skipgrams occurring within documents to arrive at word and document vectors simultaneously. This kind of analysis is effective, simple, and yields powerful concepts.

Look to your own data, and before throwing black-box deep learning machines at them, try out tensor factorizations!

He has a set of animated GIFs to help with learning, though I do wish they were about 30% slower so you can take a moment to read each section before it jumps to the next bit.

Comments closed

Cosmos DB Cheat Sheet

Published 2017-10-27 by Kevin Feasel

Melody Zacharias shows us a cheat sheet for Cosmos DB:

The Cosmos DB by Microsoft is their globally distributed, horizontally scalable, multi-model database service that is available through Azure. Released in 2014, it is the ideal DB for globally distributed applications. Formerly called DocumentDB Cosmos it now supports querying documents using SQL as a JSON query language. As a Schema-free platform, it provides automatic indexing of JSON documents without requiring an explicit schema or creation of secondary indexes. For those of use not well versed in JSON, this query cheat-sheet, has come to our rescue. It outlines common queries to retrieve information from 2 JSON documents.

Microsoft has put together this cheat-sheet to help you write your queries faster. This quick reference is a single page PDF that you can print, or keep in a handy computer file. This is version 4, so it just keeps getting better!

Click through for the link to the cheat sheet.

Comments closed

Phantom Reads

Published 2017-10-27 by Kevin Feasel

Arun Sirpal sees not-quite-there-yet transactions:

With Halloween around the corner what better topic to discuss than phantom reads. A phantom read occurs when rows have been inserted after a read operation and becomes visible in a follow-up read operation within the same transaction.

I will show you what this looks like with an example. Please note the code will not work for you because I dropped the foreign key constraint within the person.person table just so I could show you the example.

Arun mentions that the serializable isolation level prevents phantom reads. So do the repeatable read and snapshot isolation levels.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31