Author: Kevin Feasel

Cross-Server Partition Elimination

Published 2018-01-30 by Kevin Feasel

Derik Hammer shows an example of “old-style” partitioning across servers:

SQL Server has a feature for partitioning tables and indexes. Partitioning can be implemented at many levels, however. You can create multiple tables for one logical data set, you can split the set into multiple databases, and you can even split it among different servers. Sharding is another term. It refers to partitioning data to horizontally scale out compute resources and storage.

There are different methods of handling sharding. Each of them need a central point of connection to handle querying the data on the other shards. This is typically called the control node. The method I am to discuss today is one where linked servers are used to connect the various shards.

This is useful for something like offloading old invoices which you rarely need to a separate server. Derik also shows that the optimizer can, if it knows your partitioning rules, avoid unnecessary cross-server hits.

Comments closed

Subtle Changes In Application Behavior Across Kafka Streams Versions

Published 2018-01-29 by Kevin Feasel

Aleksandar Pejakovic shows some subtle but important changes to an application running Kafka Streams 0.11 versus 1.0:

One would expect that by changing the version, the previous behavior would remain the same. Well, it hasn’t. What has changed?

After each process method, a punctuate method is called. After punctuateInterval is scheduled, punctuate also occurs. This means the following:

In the first test scenario, each “Arrived: message_<offset>” message in the console is accompanied with “Punctuate call”. Unsurprisingly, we have one: “Processed: 1” message in output topic. After ten messages, we have another: “Punctuate call” and “Processed: 0” pair.

In the second scenario, we have nine: “Arrived: message_<offset>” and “Punctuate call” pairs on the console, followed with 9: “Processed: 1” in the output topic. After the pause and tenth message we have: “Arrived: message_<offset>” and 3 “Punctuate call”. In the output topic, we see “Processed: 1”, “Processed: 0”, and “Processed 0”.

Read the whole thing. This sort of behavioral change can be hard to suss out when testing a streaming application.

Comments closed

Fun With ML Services And VARBINARY

Published 2018-01-29 by Kevin Feasel

I wrap up my ML Services mini-series by building out a process to predict sales for multiple products using different models:

I have my model as an input and want to spit it out at the end as well. But when I try that, I get an error:

Msg 39017, Level 16, State 3, Line 239
Input data query returns column #1 of type ‘varbinary(max)’ which is not supported by the runtime for ‘R’ script. Unsupported types are binary, varbinary, timestamp, datetime2, datetimeoffset, time, text, ntext, image, hierarchyid, xml, sql_variant and user-defined type.

So there goes that plan—I can output a VARBINARY(MAX) model, but I cannot input one.

Click through to see my workaround.

Comments closed

Goodbye, SQLPS

Published 2018-01-29 by Kevin Feasel

Max Trinidad notes that SSMS 17.4 gets rid of the SQLPS module:

After SSMS Version 17.4 was release back in December, SQLPS module is no longer available. So, if you try to use the “Start PowerShell” from any of the database object, you’ll get the message “No SQL Server cmdlets found…” popup message.

And good riddance. Even in 2008, the SQLPS method of dealing with Powershell was obsolete, as Powershell modules were supposed to be snap-ins rather than independent shells. The SQL Server Powershell module is a major improvement in that regard.

Comments closed

R Or Python

Published 2018-01-29 by Kevin Feasel

Tomaz Kastrun shares his thoughts on the topic of R versus Python:

Imag[in]e I ask you, would you prefer Apple iPhone over Samsung Galaxy, respectively? Or if I would ask you, would you prefer BMW over Audi, respectively? In all the cases, both phones or both cars will get the job done. So will Python or R, R or Python. So instead of asking which one I prefer, ask your self, which one suits my environment better? If your background is more statistics and less programming, take R, if you are more into programming and less into statistics, take Python; in both cases you will have faster time to accomplish results with your preferred language. If you ask me, can I do gradient boosting or ANOVA or MDS in Python or in R, the answer will be yes, you can do both in any of the languages.

This graf hits the crux of my opinion on the topic, but as I’ve gone deeper into the topic over the past year, I think the correct answer is probably “both” for a mature organization and “pick the one which suits you better” for beginners.

Comments closed

More SSMS Tips And Tricks

Published 2018-01-29 by Kevin Feasel

Wayne Sheffield has another batch of SSMS tips and tricks for us. First, he provides some helpful hints with comments. Then comes a useful addition to SSMS 2016, comparing query plans:

Notice that various options have a colored non-equals icon. Here you can quickly see the various values that are different between the two execution plans.

At the bottom of the execution plans is a Showplan Analysis window. This window has color-coded keys for various sections of the plan:

He also shows how to import and export your SSMS configuration settings. This makes it easier to migrate to a different machine or keep your desktop and laptop looking the same.

Comments closed

Connection Pooling And Slow Leaks

Published 2018-01-29 by Kevin Feasel

Warren Estes explains how connection pools work and troubleshoots a connection pooling issue:

When an application connects to a database it takes resources to establish that connection. So rather than doing this over and over again a connection pool is established to handle this functionality and cache connections. There are several issues that can arise if either the pool is not created with the same connection string (fragmentation), or if the connections are simply not closed/disposed of properly.

In the case of fragmentation, each connection string associated with a connection is considered part of 1 connection pool. If you create 2 connection strings with different database names, maxpool values, timeouts, or security then you will in effect create different connection pools. This is much like how query plans get stored in the plan cache. Different white space, capital letters all create different plans.

You can get the .NET pool counts from:
Performance Monitor> .NET data provider for SQL Server > NumberOfActiveConnectionPools

Click through for more information.

Comments closed

Finding The Last Known Good CHECKDB Run

Published 2018-01-29 by Kevin Feasel

Amy Herold shows how to find the last known CHECKDB run for each database on a SQL Server instance:

Wednesday I walk into the office and immediately hear that CHECKDB is the source of issues on one of the servers and is the reason behind some errors that have been happening. While I don’t think this is the case (it might look like it on the surface but there is something else that is happening that is the actual cause) I also wanted to find out what CHECKDB was running at the time the errors occurred.

I needed information on when CHECKDB ran for each database. When you look for what you can run to find when CHECKDB was last run you find this blog post and also this blog post on grabbing this info. While these were very informative, they were for one database at a time. I need this for all the databases so I can try to not only find out when each one ran, but also use these time stamps to figure out the duration.

The big recommendation I’d make with regard to this is not to use sp_msforeachdb. Otherwise, click through for a good script.

Comments closed

SQL Server Backups On Azure VM

Published 2018-01-29 by Kevin Feasel

Rolf Tesmer shows us various options available for backing up SQL Server on Azure VMs:

Recently I had a requirement to collate and briefly compare some of the various methods to perform SQL Server backup for databases deployed onto Azure IaaS machines. The purpose was to provide a few options to cater for the different types(OLTP, DW, etc) and sizes (small to big) of databases that could be deployed there.

Up front, I am NOT saying that these are the ONLY options to perform standard SQL backups! I am sure there are others – however – the below are both supported and well documented – which when it comes to something as critical as backups is pretty important.

So the purpose of this blog is to provide a quick and brief list of various SQL backup methods!

Read on for the options.

Comments closed

A Definition Of Functional Programming

Published 2018-01-26 by Kevin Feasel

Kevin Sookocheff contrasts functional programming with its imperative cousin:

Functional programming is a form of declarative programming that expresses a computation directly as pure functional transformation of data. A functional program can be viewed as a declarative program where computations are specified as pure functions.

I think that if you’re a set-based SQL developer, functional programming languages will make the most intuitive sense. They’re a bit harder to wrap your mind around if you’ve grown up as an imperative C-style developer, but are still worth the effort.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31