Author: Kevin Feasel

Data Protection Principles

Published 2018-06-27 by Kevin Feasel

K. Brian Kelley gives us an overview of what database security entails:

We have to be sure we know what accesses our data. There isn’t a technical solution that can automatically give us the answer. We can’t run a PowerShell script and know immediately everything that hits our key financial database. Over time we can collect that information, but the key word is “time.” If I look today, and today is not quarter end, then I don’t see the quarter end processes. If we’re looking at our HR related databases, then we really don’t know everything unless we also take into account the annual enrollment period.

The only way to be able to follow the principle of least privilege correctly is to know who and what access our data. This also includes ad hoc access, like folks running reports through SQL Server Reporting Services (SSRS) or doing analysis through Microsoft Excel. Therefore, in order to improve our data protection, we have to understand what accesses that data.

Obviously, documentation is required. When we have documentation there’s always the problem with keeping that documentation updated. While there are tools available, this task ultimately falls to people. Realistically, this is a battle we will always have to fight. Taking time to update documentation means we take time from other efforts. However, if we want to be serious about data protection, we have to know what accesses that data in order to be able to protect it.

It’s interesting to contrast this with Alex Yates’s essay on the topic.

Comments closed

Using DAX To Calculate Dates Between Transactions

Published 2018-06-27 by Kevin Feasel

Matt Allington needs a haircut:

I got my haircut today (pretty spiffy one too, even if I do say so myself). While I was chatting I asked my hair dresser “on average, how often should I get my hair cut”? She told me (for men) around 4-6 weeks. Then I got thinking (as I do), I wonder if I could data-mine my credit card data using Power BI and find out how often I actually get my own hair cut. It turns out I was able to do this, and this article explains the hardest part of that task – find the number of days between two transaction dates using DAX.

I’d probably end up doing this in SQL with the LAG function, but it’s good to know several ways to solve date difference problems.

Comments closed

Missing KB2919355 When Installing SQL Server

Published 2018-06-27 by Kevin Feasel

Ryan Allport explains how to install SQL Server 2016 on Windows Server 2012 R2 when you get the Rule “KB2919355 Installation” failed error message:

As you can see, the upgrade feature rules check failed around the KB2919355 installation. At this point, reading the error message, I assumed (I know, I know, it’s something we should never do as a DBA!) that the patch had been downloaded and applied during the latest round of Windows patching, and all that was required was a server reboot. I was wrong.

Upon running the upgrade again, I got the same error message. Hmm, annoying. So, after some Googling I was confident I knew what to do to resolve this; download and install the KB2919355 patch. So, I downloaded the patch from the official Microsoft website (KB2919355) and kicked off the installation.

There’s a bit more to it than “install the patch.”

Comments closed

Optimizing Replication Agent Parameters

Published 2018-06-27 by Kevin Feasel

Sourabh Agarwal shows us a couple of levers we have in improving replication performance:

When SQL Server replication is used on environments with high traffic OLTP systems, users often need to adjust the agent profile parameters to increase the throughput of the log reader and distribution agents to keep up with the workload. We recently performed a series of tests to measure the performance of log reader and distribution agents while changing some of the parameters for these agents. This blog summarizes the outcomes and conclusions from this testing.

Read on for the relevant parameters.

Comments closed

Converting Factors To Numbers In R

Published 2018-06-26 by Kevin Feasel

Sebastian Sauer shows us a pitfall of brute-force conversion of factors to integers:

Oh no! That’s not what we wanted! R has messed the thing up (?). The reason is that R sees the first factor level internally as the number 1 . The second level as number two. What’s the first factor level in our case? Let’s see:
factor(tips$sex) %>% head()
#> [1] Female Male   Male   Male   Female Male  
#> Levels: Female Male
factor(tips$sex_r) %>% head()
#> [1] 1 0 0 0 1 0
#> Levels: 0 1
That’s confusing: “0” is the first level of sex_r – internally for R represented by “1”. The second level of sex_r is “1” – internally represented by “2”.

Fortunately, we get the easy answer at the end of the post.

Comments closed

Neural Topic Models On Amazon SageMaker

Published 2018-06-26 by Kevin Feasel

David Ping, et al, show off topic modeling on Amazon SageMaker:

Topic Modeling is used to organize a corpus of documents into “topics” which is a grouping based on a statistical distribution of words within the documents themselves. Amazon Comprehend, our fully managed text analytics service, provides a pre-configured topic modeling API that is best suited for the most popular use cases like organizing customer feedback, support incidents or workgroup documents. Amazon Comprehend is the suggested topic modeling choice for customers as it removes a lot of the most routine steps associated with topic modeling like tokenization, training a model and adjusting parameters. Amazon SageMaker’s Neural Topic Model (NTM) caters to the use cases where a finer control of the training, optimization, and/or hosting of a topic model is required, such as training models on text corpus of particular writing style or domain, or hosting topic models as part of a web application. While Amazon SageMaker NTM provides a starting point of state-of-the-art topic modeling, customers have the flexibility to modify the network architecture as well as hyperparameters to accommodate the idiosyncrasies of their data sets as well as to tune the trade-off between a multitude of metrics such as document modeling accuracy, human interpretability and granularity of the learned topics, based on their applications. In addition, Amazon SageMaker NTM leverages the full power of the Amazon SageMaker platform: easily configurable training and hosting infrastructure, automatic hyperparameter optimization, and fully-managed hosting with auto-scaling.

They walk through the entire topic modeling process, so check it out.

Comments closed

AVG And Data Types

Published 2018-06-26 by Kevin Feasel

Kendra Little explains how the AVG() function works with a couple different data types:

This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.

I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.

Want to take the quiz before you read the answers?

Head over here and take the quiz first.

Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.

Kendra explains each of the answers, so I’d recommend taking the quiz first.

Comments closed

Estimating Columnstore Compression

Published 2018-06-26 by Kevin Feasel

Niko Neugebauer has a way of estimating disk size after creating a columnstore index on a table:

For anyone working with SQL Server since version 2005 (with Service Pack 2 to be precise) there is a very common task when thinking or research the system optimisation – the usage of the compression, and before advancing with this step the question that one usually receives is – “can you estimate how much improvement we shall get?”
For this purpose since SQL Server 2008, we have a very useful stored procedure that is called sp_estimate_data_compression_savings, that is capable of providing us with the estimation of how much storage we can save by enabling or moving to a more effective compression method (as in NONE | ROW | PAGE types).
Columnstore Indexes have appeared in SQL Server 2012 (that is well over 6 years ago) and even though from time to time I would ask and suggest to enable the stored procedure sp_estimate_data_compression_savings to start supporting Columnstore Indexes, until now there is no such support.

Until now – I am introducing a conjunction of my scripts in a comprehensive and reasonably capable stored procedure that is called “cstore_sp_estimate_columnstore_compression_savings” and that is a part of my free & open-sourced Columnstore Indexes Script Library, freely available on GitHub.

That’s a useful addition, especially when you’re trying to sell management on using clustered columnstore indexes.

Comments closed

Traditional Database Security Doesn’t Protect Data

Published 2018-06-26 by Kevin Feasel

Alex Yates has a controversial topic and some interesting thoughts:

Unfortunately, traditional database security has failed us.

Developers need access to the dev database to do their work. They need to be able to use appropriate test data to test their code. Traditional security features (logins, roles and users – even encryption technologies, dynamic data masking and row level security, etc) can be used to manage who has access to the data in production systems, but if a dev or test database already has the sensitive data these fundamental security features are worthless with regard to data protection. Even encrypted data only remains secure if the keys remain safe.

Sure, traditional security features protect the data in the production system – but not if it has already been copied to a less secure environment. And most people don’t track that with anywhere near as much rigor as they should.

To protect data effectively, we need to think much more consciously, not just about the production database, but also about all the other databases and backups that make up our database lifecycles, including dev and test systems and dev workstations. We need to know exactly where our security perimeter lies. Any copy of sensitive production data needs to live within the security perimeter and not outside it.

Read the whole article.

1 Comment

Biml 2018 Release Day

Published 2018-06-26 by Kevin Feasel

Andy Leonard has a bunch of new presents to unwrap:

Varigence keeps giving away cool stuff! Nowhere is Varigence’s commitment to community more evident than in the feature list for BimlExpress 2018. The previous version – BimlExpress 2017 – included the Preview Pane. BimlExpress 2018 includes the ability to Convert SSIS Packages to Biml:

How cool is that? And it’s in the free (FREE!) version!

As with BimlFlex and BimlStudio, there are too many cool features to list here. Head over to the BimlExpress 2018 feature page to learn more.

Converting existing packages to Biml was a great feature that I could never afford. It’s exceedingly nice of Scott Currie & crew to make that available in the free product.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31