Author: Kevin Feasel

The general steps are

Set up your source query.
Pass the data through a Lookup for your Dimension with the missing results routed to a “No Match” output.
Insert those “No Match” rows into your Dimension using a SQL task – checking to make sure that this particular row hasn’t already been inserted (this is important).
Do another lookup using a “Partial Cache” to catch these newly-inserted members.
Use a UNION ALL transform to bring the existing and late-arriving members together.

Click through for more information and a helpful package diagram.

Comments closed

Configuring The Model Database

Published 2017-10-31 by Kevin Feasel

Monica Rathbun explains why you might want to configure your model database:

Top (in no particular order) Settings I have Implemented Through Model

Default Growth Settings
Query Store Settings
Recovery Models
Read Committed Snapshot Isolation
Allow Snapshot Isolation
Auto Update Statistics Asynchronously
Compatibility Levels

Click through for more information and to see how to make some of these changes.

Comments closed

Decrypting SSIS Passwords

Published 2017-10-31 by Kevin Feasel

Jason Brimhall shows how to decrypt your Integration Services package’s password if you have a SQL Agent job set to execute that package:

Take note here that I am only querying the msdb database. There is nothing exceedingly top secret here – yet. Most DBAs should be extremely familiar with these tables and functions that I am using here.

What does this show me though? If I have a package that is being run via Agent Job in msdb, then the sensitive information needs to be decrypted somehow. So, in order to do that decryption the password needs to be passed to the package. As it turns out, the password will be stored in the msdb database following the “DECRYPT” switch for the dtutil utility. Since I happen to have a few of these packages already available, when I run this particular query, I will see something like the following in my results.

That’s a clever solution. I get the feeling that I should be a bit perturbed by how simple this is, but I don’t; the real sensitive data is still secure.

Comments closed

Position Differences And Convolutional Neural Networks

Published 2017-10-30 by Kevin Feasel

Pete Warden shares his knowledge of how convolutional neural networks deal with position differences in images:

If you’re trying to recognize all images with the sun shape in them, how do you make sure that the model works even if the sun can be at any position in the image? It’s an interesting problem because there are really three stages of enlightenment in how you perceive it:

If you haven’t tried to program computers, it looks simple to solve because our eyes and brain have no problem dealing with the differences in positioning.
If you have tried to solve similar problems with traditional programming, your heart will probably sink because you’ll know both how hard dealing with input differences will be, and how tough it can be to explain to your clients why it’s so tricky.
As a certified Deep Learning Guru, you’ll sagely stroke your beard and smile, safe in the knowledge that your networks will take such trivial issues in their stride.

It’s a good read.

Comments closed

Avro And Streaming Data

Published 2017-10-30 by Kevin Feasel

Pat Patterson shows how to get the advantages of the Avro file format while streaming individual records:

Avro is a very efficient way of storing data in files, since the schema is written just once, at the beginning of the file, followed by any number of records (contrast this with JSON or XML, where each data element is tagged with metadata). Similarly, Avro is well suited to connection-oriented protocols, where participants can exchange schema data at the start of a session and exchange serialized records from that point on. Avro works less well in a message-oriented scenario since producers and consumers are loosely coupled and may read or write any number of records at a time. To ensure that the consumer has the correct schema, it must either be exchanged “out of band” or accompany every message. Unfortunately, sending the schema with every message imposes significant overhead — in many cases, the schema is as big as the data or even bigger!

Read on to see how the Confluent Schema Registry can solve this problem.

Comments closed

ggplot2 Basics

Published 2017-10-30 by Kevin Feasel

Bharani Akella has an introduction to ggplot2:

Plot10: Scatter-plot

ggplot(data = mtcars,aes(x=mpg,y=hp,col=factor(cyl)))+geom_point()

mpg(miles/galloon) is assigned to the x-axis
hp(Horsepower) is assigned to the y-axis
factor(cyl) {Number of cylinders} determines the color
The geometry used is scatter plot. We can create a scatter plot by using the geom_point() function.

He has a number of similar examples showing several variations on bar, line, and scatterplot charts.

Comments closed

SSIS 2017 Scale-Out

Published 2017-10-30 by Kevin Feasel

Wolfgang Strasser has started a series on the new scale-out functionality in SQL Server Integration Services 2017. First, his introduction:

In the past, SSIS package executions were only able to run on the server that hosted the Integration Services server itself. With the rising number and requirements of more and more package executions sometimes the resources on the server ran short. Addressing this resource shortage custom scale out functionality was implemented that allowed package executions to be transfered to other “worker” machines in order to distribute execution load. With SQL Server 2017, this functionality is built into an shipped with SSIS 2017.

Before I am diving deeper into SSIS Scale Out I would like to discuss some basic vocabulary in the field of scalability.

Then, he describes the scale-out architecture:

The master is managing the available workers and all the work that is requested for execution in the scale out topoloy.

The master manages a list of (active) workers
The master gets the instructions from clients
The master knows the current state of work (queued jobs, running jobs, finished jobs, ..)

If you’re familiar with other distributed computing systems, this follows a similar path.

Comments closed

Finding Objects Relating To A Schema

Published 2017-10-30 by Kevin Feasel

Jason Brimhall has a script to help you find which objects are tied to a particular schema:

I have run into this very issue where there are far too many objects in the schema to be able to drop one by one. Add to the problem that I am looking to do this via script. Due to the need to drop the schema and the (albeit self imposed) requirement of doing it via script, I came up with the following that will cover most cases that I have encountered.

Click through for the script.

Comments closed

Persisting Computed Columns

Published 2017-10-30 by Kevin Feasel

Greg Low describes persisted computed columns:

Each time the value from that column is queried, the calculation is performed so the result can be returned. This makes sense when the value is changing regularly and the value is queried infrequently.

However, according to my completely subjective statistics, most computed columns are queried much more than they are ever changed. So why work the value out each and every time?

One really nice thing about persisted computed columns is that you can then build non-clustered indexes using these columns. It’s a great way of pre-computing work that you need to do often but which would violate rules of database normalization.

Comments closed

Finding Blocking In SQL Server

Published 2017-10-30 by Kevin Feasel

Amy Herold has a script to help you find which query is blocking your important query:

It might look complicated but it is actually very simple – query sys.sysprocesses with a cross apply using the sql_handle to get the text of the query, and then an outer apply with the same query again but you are joining to the blocking spid so you can get the text for the query that is doing the blocking. Beyond that, you can filter on various columns and refine your output

Andy Mallon goes one step further and searches for the lead blocker:

When blocking goes bad, it can go really bad. Sometimes it’s because someone (usually, that someone is me) forgets to commit a transaction before going to lunch, and those open locks cause a bunch of blocking. Sometimes a data load runs at a strange time, or an unusual amount of data gets loaded, or a query gets a bad plan and starts running long, or… you get the idea. There are a bunch of reasons this can come up.

The hardest part is that sometimes big blocking chains build up. The session I forgot to commit blocks 5 session. Each of those block 5 sessions. Each of those block 5 sessions… Eventually, I have 8000 sessions waiting on me, and I’m off eating a kale & farro salad. Oops.

The moral of the story is, don’t eat kale and farro salads; that sounds like rabbit food.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31