Author: Kevin Feasel

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Read on to learn more.

Comments closed

Inlining Scalar Functions

Published 2018-01-30 by Kevin Feasel

Brent Ozar covers a Microsoft paper which looks exciting:

Froid replaces the scalar UDF operator in the calling query with the newly constructed relational expression as a scalar sub-query.

That one statement comes with a lot of interesting gotchas that they discuss throughout the paper:

You won’t see the scalar function in the plan (just like we don’t currently see single-statement inline table-valued functions by name in the plan – we just see their effects, kinda like views)
The function and the underlying tables might have different permissions (you could have permissions on the function but not the tables, or vice versa, which makes compilation & execution a little trickier)
Code that doesn’t get used can just get removed outright (just like SQL Server can do join elimination)
Costs and row estimates are now useful inside the plan

To the extent that this works (and I hope it does), it helps fulfill the promise of SQL Server 2000, with encapsulation of code. Today, one of the easiest big performance gains I can give is to strip something out of a user-defined function and inline it.

Comments closed

Cross-Server Partition Elimination

Published 2018-01-30 by Kevin Feasel

Derik Hammer shows an example of “old-style” partitioning across servers:

SQL Server has a feature for partitioning tables and indexes. Partitioning can be implemented at many levels, however. You can create multiple tables for one logical data set, you can split the set into multiple databases, and you can even split it among different servers. Sharding is another term. It refers to partitioning data to horizontally scale out compute resources and storage.

There are different methods of handling sharding. Each of them need a central point of connection to handle querying the data on the other shards. This is typically called the control node. The method I am to discuss today is one where linked servers are used to connect the various shards.

This is useful for something like offloading old invoices which you rarely need to a separate server. Derik also shows that the optimizer can, if it knows your partitioning rules, avoid unnecessary cross-server hits.

Comments closed

Subtle Changes In Application Behavior Across Kafka Streams Versions

Published 2018-01-29 by Kevin Feasel

Aleksandar Pejakovic shows some subtle but important changes to an application running Kafka Streams 0.11 versus 1.0:

One would expect that by changing the version, the previous behavior would remain the same. Well, it hasn’t. What has changed?

After each process method, a punctuate method is called. After punctuateInterval is scheduled, punctuate also occurs. This means the following:

In the first test scenario, each “Arrived: message_<offset>” message in the console is accompanied with “Punctuate call”. Unsurprisingly, we have one: “Processed: 1” message in output topic. After ten messages, we have another: “Punctuate call” and “Processed: 0” pair.

In the second scenario, we have nine: “Arrived: message_<offset>” and “Punctuate call” pairs on the console, followed with 9: “Processed: 1” in the output topic. After the pause and tenth message we have: “Arrived: message_<offset>” and 3 “Punctuate call”. In the output topic, we see “Processed: 1”, “Processed: 0”, and “Processed 0”.

Read the whole thing. This sort of behavioral change can be hard to suss out when testing a streaming application.

Comments closed

Fun With ML Services And VARBINARY

Published 2018-01-29 by Kevin Feasel

I wrap up my ML Services mini-series by building out a process to predict sales for multiple products using different models:

I have my model as an input and want to spit it out at the end as well. But when I try that, I get an error:

Msg 39017, Level 16, State 3, Line 239
Input data query returns column #1 of type ‘varbinary(max)’ which is not supported by the runtime for ‘R’ script. Unsupported types are binary, varbinary, timestamp, datetime2, datetimeoffset, time, text, ntext, image, hierarchyid, xml, sql_variant and user-defined type.

So there goes that plan—I can output a VARBINARY(MAX) model, but I cannot input one.

Click through to see my workaround.

Comments closed

Goodbye, SQLPS

Published 2018-01-29 by Kevin Feasel

Max Trinidad notes that SSMS 17.4 gets rid of the SQLPS module:

After SSMS Version 17.4 was release back in December, SQLPS module is no longer available. So, if you try to use the “Start PowerShell” from any of the database object, you’ll get the message “No SQL Server cmdlets found…” popup message.

And good riddance. Even in 2008, the SQLPS method of dealing with Powershell was obsolete, as Powershell modules were supposed to be snap-ins rather than independent shells. The SQL Server Powershell module is a major improvement in that regard.

Comments closed

R Or Python

Published 2018-01-29 by Kevin Feasel

Tomaz Kastrun shares his thoughts on the topic of R versus Python:

Imag[in]e I ask you, would you prefer Apple iPhone over Samsung Galaxy, respectively? Or if I would ask you, would you prefer BMW over Audi, respectively? In all the cases, both phones or both cars will get the job done. So will Python or R, R or Python. So instead of asking which one I prefer, ask your self, which one suits my environment better? If your background is more statistics and less programming, take R, if you are more into programming and less into statistics, take Python; in both cases you will have faster time to accomplish results with your preferred language. If you ask me, can I do gradient boosting or ANOVA or MDS in Python or in R, the answer will be yes, you can do both in any of the languages.

This graf hits the crux of my opinion on the topic, but as I’ve gone deeper into the topic over the past year, I think the correct answer is probably “both” for a mature organization and “pick the one which suits you better” for beginners.

Comments closed

More SSMS Tips And Tricks

Published 2018-01-29 by Kevin Feasel

Wayne Sheffield has another batch of SSMS tips and tricks for us. First, he provides some helpful hints with comments. Then comes a useful addition to SSMS 2016, comparing query plans:

Notice that various options have a colored non-equals icon. Here you can quickly see the various values that are different between the two execution plans.

At the bottom of the execution plans is a Showplan Analysis window. This window has color-coded keys for various sections of the plan:

He also shows how to import and export your SSMS configuration settings. This makes it easier to migrate to a different machine or keep your desktop and laptop looking the same.

Comments closed

Connection Pooling And Slow Leaks

Published 2018-01-29 by Kevin Feasel

Warren Estes explains how connection pools work and troubleshoots a connection pooling issue:

When an application connects to a database it takes resources to establish that connection. So rather than doing this over and over again a connection pool is established to handle this functionality and cache connections. There are several issues that can arise if either the pool is not created with the same connection string (fragmentation), or if the connections are simply not closed/disposed of properly.

In the case of fragmentation, each connection string associated with a connection is considered part of 1 connection pool. If you create 2 connection strings with different database names, maxpool values, timeouts, or security then you will in effect create different connection pools. This is much like how query plans get stored in the plan cache. Different white space, capital letters all create different plans.

You can get the .NET pool counts from:
Performance Monitor> .NET data provider for SQL Server > NumberOfActiveConnectionPools

Click through for more information.

Comments closed

Finding The Last Known Good CHECKDB Run

Published 2018-01-29 by Kevin Feasel

Amy Herold shows how to find the last known CHECKDB run for each database on a SQL Server instance:

Wednesday I walk into the office and immediately hear that CHECKDB is the source of issues on one of the servers and is the reason behind some errors that have been happening. While I don’t think this is the case (it might look like it on the surface but there is something else that is happening that is the actual cause) I also wanted to find out what CHECKDB was running at the time the errors occurred.

I needed information on when CHECKDB ran for each database. When you look for what you can run to find when CHECKDB was last run you find this blog post and also this blog post on grabbing this info. While these were very informative, they were for one database at a time. I need this for all the databases so I can try to not only find out when each one ran, but also use these time stamps to figure out the duration.

The big recommendation I’d make with regard to this is not to use sp_msforeachdb. Otherwise, click through for a good script.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31