2017-10-30 – Curated SQL

If you’re trying to recognize all images with the sun shape in them, how do you make sure that the model works even if the sun can be at any position in the image? It’s an interesting problem because there are really three stages of enlightenment in how you perceive it:

If you haven’t tried to program computers, it looks simple to solve because our eyes and brain have no problem dealing with the differences in positioning.
If you have tried to solve similar problems with traditional programming, your heart will probably sink because you’ll know both how hard dealing with input differences will be, and how tough it can be to explain to your clients why it’s so tricky.
As a certified Deep Learning Guru, you’ll sagely stroke your beard and smile, safe in the knowledge that your networks will take such trivial issues in their stride.

It’s a good read.

Comments closed

Avro And Streaming Data

Published 2017-10-30 by Kevin Feasel

Pat Patterson shows how to get the advantages of the Avro file format while streaming individual records:

Avro is a very efficient way of storing data in files, since the schema is written just once, at the beginning of the file, followed by any number of records (contrast this with JSON or XML, where each data element is tagged with metadata). Similarly, Avro is well suited to connection-oriented protocols, where participants can exchange schema data at the start of a session and exchange serialized records from that point on. Avro works less well in a message-oriented scenario since producers and consumers are loosely coupled and may read or write any number of records at a time. To ensure that the consumer has the correct schema, it must either be exchanged “out of band” or accompany every message. Unfortunately, sending the schema with every message imposes significant overhead — in many cases, the schema is as big as the data or even bigger!

Read on to see how the Confluent Schema Registry can solve this problem.

Comments closed

ggplot2 Basics

Published 2017-10-30 by Kevin Feasel

Bharani Akella has an introduction to ggplot2:

Plot10: Scatter-plot

ggplot(data = mtcars,aes(x=mpg,y=hp,col=factor(cyl)))+geom_point()

mpg(miles/galloon) is assigned to the x-axis
hp(Horsepower) is assigned to the y-axis
factor(cyl) {Number of cylinders} determines the color
The geometry used is scatter plot. We can create a scatter plot by using the geom_point() function.

He has a number of similar examples showing several variations on bar, line, and scatterplot charts.

Comments closed

SSIS 2017 Scale-Out

Published 2017-10-30 by Kevin Feasel

Wolfgang Strasser has started a series on the new scale-out functionality in SQL Server Integration Services 2017. First, his introduction:

In the past, SSIS package executions were only able to run on the server that hosted the Integration Services server itself. With the rising number and requirements of more and more package executions sometimes the resources on the server ran short. Addressing this resource shortage custom scale out functionality was implemented that allowed package executions to be transfered to other “worker” machines in order to distribute execution load. With SQL Server 2017, this functionality is built into an shipped with SSIS 2017.

Before I am diving deeper into SSIS Scale Out I would like to discuss some basic vocabulary in the field of scalability.

Then, he describes the scale-out architecture:

The master is managing the available workers and all the work that is requested for execution in the scale out topoloy.

The master manages a list of (active) workers
The master gets the instructions from clients
The master knows the current state of work (queued jobs, running jobs, finished jobs, ..)

If you’re familiar with other distributed computing systems, this follows a similar path.

Comments closed

Finding Objects Relating To A Schema

Published 2017-10-30 by Kevin Feasel

Jason Brimhall has a script to help you find which objects are tied to a particular schema:

I have run into this very issue where there are far too many objects in the schema to be able to drop one by one. Add to the problem that I am looking to do this via script. Due to the need to drop the schema and the (albeit self imposed) requirement of doing it via script, I came up with the following that will cover most cases that I have encountered.

Click through for the script.

Comments closed

Persisting Computed Columns

Published 2017-10-30 by Kevin Feasel

Greg Low describes persisted computed columns:

Each time the value from that column is queried, the calculation is performed so the result can be returned. This makes sense when the value is changing regularly and the value is queried infrequently.

However, according to my completely subjective statistics, most computed columns are queried much more than they are ever changed. So why work the value out each and every time?

One really nice thing about persisted computed columns is that you can then build non-clustered indexes using these columns. It’s a great way of pre-computing work that you need to do often but which would violate rules of database normalization.

Comments closed

Finding Blocking In SQL Server

Published 2017-10-30 by Kevin Feasel

Amy Herold has a script to help you find which query is blocking your important query:

It might look complicated but it is actually very simple – query sys.sysprocesses with a cross apply using the sql_handle to get the text of the query, and then an outer apply with the same query again but you are joining to the blocking spid so you can get the text for the query that is doing the blocking. Beyond that, you can filter on various columns and refine your output

Andy Mallon goes one step further and searches for the lead blocker:

When blocking goes bad, it can go really bad. Sometimes it’s because someone (usually, that someone is me) forgets to commit a transaction before going to lunch, and those open locks cause a bunch of blocking. Sometimes a data load runs at a strange time, or an unusual amount of data gets loaded, or a query gets a bad plan and starts running long, or… you get the idea. There are a bunch of reasons this can come up.

The hardest part is that sometimes big blocking chains build up. The session I forgot to commit blocks 5 session. Each of those block 5 sessions. Each of those block 5 sessions… Eventually, I have 8000 sessions waiting on me, and I’m off eating a kale & farro salad. Oops.

The moral of the story is, don’t eat kale and farro salads; that sounds like rabbit food.

Comments closed

Line Continuation In T-SQL

Published 2017-10-30 by Kevin Feasel

Solomon Rutzky shows how line continuation works with SQL Server:

While it is not widely used (at least I have never seen anyone besides myself use it), T-SQL does actually have a line-continuation character: \ (backslash). Placing a backslash at the end of a line within a string literal (or constant as the MSDN documentation refers to it) or binary string will ignore the newline after the backslash. For example:
PRINT N'Same
Line';
displays the following in the “Messages” tab:

Same
Line

But, add in the backslash (well, a space and then a backslash so that it looks right):
PRINT N'Same \
Line';
and now the following is displayed in the “Messages” tab:

Same Line

Read on for more details.

Comments closed

Columnstore Indexes And ML Services

Published 2017-10-30 by Kevin Feasel

Niko Neugebauer picks up on some changes that SQL Server 2017 Machine Learning Services can use with respect to columnstore indexes:

I expect not just a couple of rows to be sent over for the Machine Learning Services, but huge tables with million of rows, that also contain hundreds of columns, because this kind of tables are the basis for the Data Science and Machine Learning processes.
While of course we are focusing here on rather small part of the total process (just the IO between SQL Server relational Engine and the Machine Learning Services), where the analytical process itself can take hours, but the IO can still make a good difference in some cases.
I love this improvement, which is very under-the-hood, but it will help a couple of people to make a decision of migrating to the freshly released SQL Server 2017 instead of the SQL Server 2016.

I haven’t quite taken advantage of this yet (just moved to 2017 but still in 130 compatibility mode) but fingers crossed that I’ll see those improvements.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: October 30, 2017

Position Differences And Convolutional Neural Networks

Avro And Streaming Data

ggplot2 Basics

Plot10: Scatter-plot

SSIS 2017 Scale-Out

Finding Objects Relating To A Schema

Persisting Computed Columns

Finding Blocking In SQL Server

Line Continuation In T-SQL

Columnstore Indexes And ML Services