Month: June 2018

In honour of the triumphant return of reprex to CRAN, let’s revisit what I refer to as Jenny Bryan’s keys to reprex-cellence. The three keys are as follows:

code that actually runs
code that I don’t have to run
code that can be easily run

Very useful if you want to get help on a problem.

Comments closed

Bug When Importing Packages In BimlExpress 2018

Published 2018-06-27 by Kevin Feasel

Andy Leonard reports a bug as well as a temporary workaround for BimlExpress 2018:

I had no sooner published my blog post about the coolness of Biml 2018 when I encountered a bug trying to use one of the features I really like – converting SSIS packages to Biml using (FREE!) BimlExpress 2018.

My first response was, “Durnit! This worked in the test versions.” My second response was to drop a note into an issue-tracking site Varigence set up to record these kinds of things.

And then I started getting emails similar to, “Hey Andy, I get this error when I try to use the new ‘Convert SSIS Packages to Biml’ feature”…

David Stein has a workaround for us until Varigence can fix the bug.

Comments closed

Mutli-Branch Pipelines In Jenkins

Published 2018-06-27 by Kevin Feasel

Chris Adkin continues his SQL Server continuous integration series:

Whatever you elect to do there will always be a master branch, where you go from here depends on whether you favor branching or feature toggles. Wikipedia provides a nice definition of what a feature toggle is, thus:

A feature toggle^[1] (also feature switch, feature flag, feature flipper, conditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.^[2]

A branch is initially a clone of the master branch to begin with, developers work on the branch. Once the work on that branch is code complete and it has been tested to satisfaction, it is merged into the master. An overview of the branching and merging process is provided in the Git documentation here.

The continuous integration and delivery purist are not great fans of branches and prefer the ethos of integrating changes into one place to be rigidly adhered to, ergo one code branch only. However, in practice you will find that most projects have to come up with some sane branching strategy. The subject of branching is a topic in its own right, suffice it to say there is an overhead in applying changes across multiple branches and overheads involved in merging into the master branch. Therefore, there needs to be some governance and rigor applied around the number of branches in the source code repository.

Chris then shows us how to create a multi-branch pipeline with Jenkins.

Comments closed

Data Protection Principles

Published 2018-06-27 by Kevin Feasel

K. Brian Kelley gives us an overview of what database security entails:

We have to be sure we know what accesses our data. There isn’t a technical solution that can automatically give us the answer. We can’t run a PowerShell script and know immediately everything that hits our key financial database. Over time we can collect that information, but the key word is “time.” If I look today, and today is not quarter end, then I don’t see the quarter end processes. If we’re looking at our HR related databases, then we really don’t know everything unless we also take into account the annual enrollment period.

The only way to be able to follow the principle of least privilege correctly is to know who and what access our data. This also includes ad hoc access, like folks running reports through SQL Server Reporting Services (SSRS) or doing analysis through Microsoft Excel. Therefore, in order to improve our data protection, we have to understand what accesses that data.

Obviously, documentation is required. When we have documentation there’s always the problem with keeping that documentation updated. While there are tools available, this task ultimately falls to people. Realistically, this is a battle we will always have to fight. Taking time to update documentation means we take time from other efforts. However, if we want to be serious about data protection, we have to know what accesses that data in order to be able to protect it.

It’s interesting to contrast this with Alex Yates’s essay on the topic.

Comments closed

Using DAX To Calculate Dates Between Transactions

Published 2018-06-27 by Kevin Feasel

Matt Allington needs a haircut:

I got my haircut today (pretty spiffy one too, even if I do say so myself). While I was chatting I asked my hair dresser “on average, how often should I get my hair cut”? She told me (for men) around 4-6 weeks. Then I got thinking (as I do), I wonder if I could data-mine my credit card data using Power BI and find out how often I actually get my own hair cut. It turns out I was able to do this, and this article explains the hardest part of that task – find the number of days between two transaction dates using DAX.

I’d probably end up doing this in SQL with the LAG function, but it’s good to know several ways to solve date difference problems.

Comments closed

Missing KB2919355 When Installing SQL Server

Published 2018-06-27 by Kevin Feasel

Ryan Allport explains how to install SQL Server 2016 on Windows Server 2012 R2 when you get the Rule “KB2919355 Installation” failed error message:

As you can see, the upgrade feature rules check failed around the KB2919355 installation. At this point, reading the error message, I assumed (I know, I know, it’s something we should never do as a DBA!) that the patch had been downloaded and applied during the latest round of Windows patching, and all that was required was a server reboot. I was wrong.

Upon running the upgrade again, I got the same error message. Hmm, annoying. So, after some Googling I was confident I knew what to do to resolve this; download and install the KB2919355 patch. So, I downloaded the patch from the official Microsoft website (KB2919355) and kicked off the installation.

There’s a bit more to it than “install the patch.”

Comments closed

Optimizing Replication Agent Parameters

Published 2018-06-27 by Kevin Feasel

Sourabh Agarwal shows us a couple of levers we have in improving replication performance:

When SQL Server replication is used on environments with high traffic OLTP systems, users often need to adjust the agent profile parameters to increase the throughput of the log reader and distribution agents to keep up with the workload. We recently performed a series of tests to measure the performance of log reader and distribution agents while changing some of the parameters for these agents. This blog summarizes the outcomes and conclusions from this testing.

Read on for the relevant parameters.

Comments closed

Converting Factors To Numbers In R

Published 2018-06-26 by Kevin Feasel

Sebastian Sauer shows us a pitfall of brute-force conversion of factors to integers:

Oh no! That’s not what we wanted! R has messed the thing up (?). The reason is that R sees the first factor level internally as the number 1 . The second level as number two. What’s the first factor level in our case? Let’s see:
factor(tips$sex) %>% head()
#> [1] Female Male   Male   Male   Female Male  
#> Levels: Female Male
factor(tips$sex_r) %>% head()
#> [1] 1 0 0 0 1 0
#> Levels: 0 1
That’s confusing: “0” is the first level of sex_r – internally for R represented by “1”. The second level of sex_r is “1” – internally represented by “2”.

Fortunately, we get the easy answer at the end of the post.

Comments closed

Neural Topic Models On Amazon SageMaker

Published 2018-06-26 by Kevin Feasel

David Ping, et al, show off topic modeling on Amazon SageMaker:

Topic Modeling is used to organize a corpus of documents into “topics” which is a grouping based on a statistical distribution of words within the documents themselves. Amazon Comprehend, our fully managed text analytics service, provides a pre-configured topic modeling API that is best suited for the most popular use cases like organizing customer feedback, support incidents or workgroup documents. Amazon Comprehend is the suggested topic modeling choice for customers as it removes a lot of the most routine steps associated with topic modeling like tokenization, training a model and adjusting parameters. Amazon SageMaker’s Neural Topic Model (NTM) caters to the use cases where a finer control of the training, optimization, and/or hosting of a topic model is required, such as training models on text corpus of particular writing style or domain, or hosting topic models as part of a web application. While Amazon SageMaker NTM provides a starting point of state-of-the-art topic modeling, customers have the flexibility to modify the network architecture as well as hyperparameters to accommodate the idiosyncrasies of their data sets as well as to tune the trade-off between a multitude of metrics such as document modeling accuracy, human interpretability and granularity of the learned topics, based on their applications. In addition, Amazon SageMaker NTM leverages the full power of the Amazon SageMaker platform: easily configurable training and hosting infrastructure, automatic hyperparameter optimization, and fully-managed hosting with auto-scaling.

They walk through the entire topic modeling process, so check it out.

Comments closed

AVG And Data Types

Published 2018-06-26 by Kevin Feasel

Kendra Little explains how the AVG() function works with a couple different data types:

This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.

I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.

Want to take the quiz before you read the answers?

Head over here and take the quiz first.

Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.

Kendra explains each of the answers, so I’d recommend taking the quiz first.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30