Press "Enter" to skip to content

Author: Kevin Feasel

TempDB Encryption With TDE

Arun Sirpal points out an oddity in sys.databases:

If you query sys.databases, such as:

SELECT is_encrypted,name,user_access_desc FROM sys.databases WHERE database_id = 2 OR database_id = 7

It “might” throw you off. Would you not expect to see is_encrypted set to 1 for TempDB?

I thought I remembered earlier editions of SQL Server showing is_encrypted = 1 for tempdb, and I definitely remember 2016 showing 0 even when the database is encrypted.

Comments closed

Trivial Plans And Columnstore Indexes

Erik Darling warns us that trivial plans against clustered columnstore indexes could lead to row execution rather than batch execution:

Let’s look at one query with a few variations.

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM   dbo.t1 AS t
WHERE  t.Id > 0
       AND t.Id < 3;

The plan for it is alright. It’s fairly straightforward and the query finishes in about 170ms.

We can see from the graphical execution plan that it’s been Simple Parameterized. SQL Server does this to make plan caching more efficient.

Check out the entire post.

Comments closed

Improving Solr Performance

Michael Sun has some tips to improve performance of Solr operations, focusing on memory tuning but including a few other tips as well:

For time series applications, it’s very common to have queries in the following pattern

q=*:*&fq=[NOW-3DAYS TO NOW]

However, this is not a good practice from memory perspective. Under the hood, Solr converts ‘NOW’ to a specific timestamp, which is the time when the query hits Solr. Therefore, two consecutive queries with the same field query fq=[NOW-3DAYS TO NOW] are considered different queries once ‘NOW’ is replaced by the two different timestamp. As a result, both of these queries would hit disk and can’t take advantage of caches.

In most of use cases, missing data of last minute is acceptable. Therefore, try to query in the following way if your business logic allows.

q=*:*&fq=[NOW/MIN-3DAYS TO NOW/MIN]

If you’re using Solr for full text search, this is rather useful information.

Comments closed

JSON In Powershell

Adam Bertram shows an easy way of dealing with JSON data inside Powershell:

As the world continually becomes “eaten by software,” more and more services are being replaced by software. IT pros have most likely seen this in the form of software-defined everything. One of the premier components of this focus on software and with the continuing adoption of DevOps is application programming interfaces (APIs). All of these services needs to talk together and must provide a way for programs and users to interact with them. This is where APIs come in handy. But, what does this have to do with PowerShell and JSON, you ask?

APIs, more specifically REST APIs, return data when queried. This data is typically in the JSON format. JSON is a way of structuring data that makes it easy for software to consume. When working with PowerShell, Microsoft has provided some helpful tools to work with JSON called the ConvertTo-Json and ConvertFrom-Json commands. These commands allow you to quickly work with REST APIs or any other service that returns or accepts JSON as an input.

Read on for more details on how to use these commands.

Comments closed

Fraud Detection With Python

Kevin Jacobs has a walkthrough of how to use Pandas and scikit-learn to perform fraud detection against a sample set of credit card transactions:

Apparently, the data consists of 28 variables (V1, …, V28), an “Amount” field a “Class” field and the “Time” field. We do not know the exact meanings of the variables (due to privacy concerns). The Class field takes values 0 (when the transaction is not fraudulent) and value 1 (when a transaction is fraudulent). The data is unbalanced: the number of non-fraudulent transactions (where Class equals 0) is way more than the number of fraudulent transactions (where Class equals 1). Furthermore, there is a Time field. Further inspection shows that these are integers, starting from 0.

There is a small trick for getting more information than only the raw records. We can use the following code:

print(df.describe())

This code will give a statistically summary of all the columns. It shows for example that the Amount field ranges between 0.00 and 25691.16. Thus, there are no negative transactions in the data.

The Kaggle competition data set is available, so you can follow along.

Comments closed

Forcing 0 Intercept Inflates R-squared In R

John Mount has an informative post on how you can trick yourself when running linear regression models in R and forcing the y intercept to be 0:

So far so good. Let’s now remove the “intercept term” by adding the “0+” from the fitting command.

m2 <- lm(y~0+x, data=d)
t(broom::glance(m2))
##                        [,1]
## r.squared      7.524811e-01
## adj.r.squared  7.474297e-01
## sigma          3.028515e-01
## statistic      1.489647e+02
## p.value        1.935559e-30
## df             2.000000e+00
## logLik        -2.143244e+01
## AIC            4.886488e+01
## BIC            5.668039e+01
## deviance       8.988464e+00
## df.residual    9.800000e+01
d$pred2 <- predict(m2, newdata = d)

Uh oh. That appeared to vastly improve the reported R-squared and the significance (“p.value“)!

Read on to learn why this happens and how you can prevent this from tricking you in the future.

Comments closed

Scripting Azure Resources With ARM Templates

Melissa Coates has a detailed explanation of how to script out the creation and configuration of services in Azure using Azure Resource Manager (ARM) templates:

In many cases, you can easily provision resources in the web-based Azure portal. If you’re never going to repeat the deployment process, then by all means use the interface in the Azure portal. It doesn’t always make sense to invest the time in automated deployments. However, ARM templates are really helpful if you’re interested in achieving repeatability, improving accuracy, achieving consistency between environments, and reducing manual effort.

Use ARM templates if you intend to:

  • Include the configuration of Azure resources in source control (“Infrastructure as Code”), and/or

  • Repeat the deployment process numerous times, and/or

  • Automate deployments, and/or

  • Employ continuous integration techniques, and/or

  • Utilize DevOps principles and practices, and/or

  • Repeatedly utilize testing infrastructure then de-provision it when finished

Melissa walks through an example of deploying a website with backing database, along with various configuration changes.

Comments closed

Linked Servers And The Kerberos Double-Hop Problem

Jana Sattainathan shows how to set up Kerberos pass-through when dealing with linked servers:

Let us say you have SQLServer1 and you want to setup a linked server to SQLServer2 using “pass-through authentication”, a double-hop happens as explain in the article below. Basically, the first hop is when the user authenticates to SQLServer1 and the second hop when that gets passed on from SQLServer1 to SQLServer2.

The below article is a must-read before you proceed:

The three nodes involved in the double-hop as illustrated in the example are

  1. Client – The client PC from which the user is initiating connection to SQLServer1

  2. Middle server – SQLServer1

  3. Second server – SQLServer2

Dealing with the double-hop problem is far trickier than it should be; if you’ve had to deal with this, I recommend Jana’s guide.

Comments closed

Good Indexes Make Good Queries

Thomas Rushton has an example of a positive indexing experience:

100k runs of the query in a ten minute interval? yeowch. Yeah, this should be optimised if possible. The primary wait type was CPU – indicating that the data was all in RAM, but the CPU was having to schlep through the entire table to find what it needed. Or to find that it didn’t need anything. Or something.

The interesting story buried in here is that none of the tooling Thomas used indicated that things could be improved with a better index, and yet there was a tremendous opportunity here based on the graphics at the end.

Comments closed

Leave ANSI PADDING On

Kenneth Fisher explains what the ANSI PADDING setting does:

ON is the default and is what you would expect. Trailing spaces are saved in VARCHAR and in CHAR additional spaces added to fill the entire space. When ANSI_PADDING is off then additional spaces are not saved .. unless the column is CHAR AND NOT NULL.

So there’s the first reason to not turn ANSI_PADDING off. Most people expect the ON results and the OFF results can be .. let’s just say confusing.

Click through for more details, including how painful it is to change the setting on a column after the fact.

Comments closed