Author: Kevin Feasel

Just Update Those Servers Already

Published 2018-06-28 by Kevin Feasel

Randolph West wants none of your excuses:

Folks, we all like to make sure we’re doing our level best to make things work smoothly.

So why am I staring at someone’s server that has never been updated since it was first set up almost three years ago?

Do better, so that I don’t have to yell at you. Seriously.

When we ignore updates, we are ignoring preventable catastrophic problems; we are ignoring fixes to security bugs, performance bugs, and data corruption bugs. Each one of these things could give you a really bad day. In two out of three cases it might even be a career-limiting move.

There are risks to patching servers, but the downside risk tends to be much larger and administrators need to be able to mitigate update risks through redundancy, automation, and having a rollback plan if needed. It’s more work than not patching, but the outcome is much better.

Comments closed

Scaling Azure Analysis Services

Published 2018-06-28 by Kevin Feasel

Chris Seferlis helps us make the decision between scaling up or out with Azure Analysis Services:

Some of you may not know when or how to scale up your queries or scale out your processing. Today I’d like to help with understanding when and how using Azure Analysis Services. First, you need to decide which tier you should be using. You can do that by looking at the QPUs (Query Processing Units) of each tier on Azure. Here’s a quick breakdown:

Developer Tier – gives you up to 20 QPUs
Basic Tier – is a mid-scale tier, not meant for heavy loads
Standard Tier (currently the highest available) – allows you more capability and flexibility

Read on for some pointers.

Comments closed

OFFSET – FETCH Versus ROWNUM In Oracle

Published 2018-06-28 by Kevin Feasel

Lukas Eder compares the OFFSET FETCH logic versus using ROWNUM for grabbing an ordered sub-selection of rows in Oracle:

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  COUNT STOPKEY                |         |       |
|   2 |   VIEW                        |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

-------------------------------------------------
| Id  | Operation                | Name | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |
|*  1 |  VIEW                    |      |     1 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000 |
|   3 |    TABLE ACCESS FULL     | FILM |  1000 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Lukas digs into this and is not the biggest fan of OFFSET-FETCH. On the SQL Server side, my anecdotal experience has been that it doesn’t perform nearly as well as you’d like either.

Comments closed

Counting Arguments In R

Published 2018-06-27 by Kevin Feasel

Neil Saunders shares methods for interrogating argument lists in R:

“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?”

It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there.

There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they?

It’s a fun exercise and helpful for learning a bit more about how to work with arguments when metaprogramming in R.

Comments closed

Analyzing Federal Reserve Data With Ordinary Least Squares

Published 2018-06-27 by Kevin Feasel

Sam Shum has a tutorial walking us through extracting and analyzing data from the St. Louis Federal Reserve’s FRED economic database:

Download specific macroeconomic data from FRED St. Louis economic databases and ETL the data. Many other data series can be found at the FRED’s website.

# get unemployment data time series from FRED St. Louis
dfunrate <- get_fred_series("UNRATE", "unrate", observation_start = startdate, observation_end = enddate)

# get University of Michigan consumer sentiment index data time series from FRED St. Louis
dfumcsent <- get_fred_series("UMCSENT", "umcsent", observation_start = startdate, observation_end = enddate)

# combine the two time series data into one data frame
dfall <- cbind(dfunrate,dfumcsent)

# strip or remove redundant month field from data downloaded from FRED St. Louis
dfall <- dfall[,c(1,2,4)]

# obtain the number of data points in the dataframe
mdx <- (1:nrow(dfall))  

# convert FRED date field from string to R's date type
dfall$date <- as.Date(dfall$date)

There’s a nice chart builder on the FRED website too, but it’s good to be able to grab the data on your own.

Comments closed

Reproducable Examples In R

Published 2018-06-27 by Kevin Feasel

Mara Averick shows us an example of a reproducable example in R, useful when reporting errors:

In honour of the triumphant return of reprex to CRAN, let’s revisit what I refer to as Jenny Bryan’s keys to reprex-cellence. The three keys are as follows:

code that actually runs
code that I don’t have to run
code that can be easily run

Very useful if you want to get help on a problem.

Comments closed

Bug When Importing Packages In BimlExpress 2018

Published 2018-06-27 by Kevin Feasel

Andy Leonard reports a bug as well as a temporary workaround for BimlExpress 2018:

I had no sooner published my blog post about the coolness of Biml 2018 when I encountered a bug trying to use one of the features I really like – converting SSIS packages to Biml using (FREE!) BimlExpress 2018.

My first response was, “Durnit! This worked in the test versions.” My second response was to drop a note into an issue-tracking site Varigence set up to record these kinds of things.

And then I started getting emails similar to, “Hey Andy, I get this error when I try to use the new ‘Convert SSIS Packages to Biml’ feature”…

David Stein has a workaround for us until Varigence can fix the bug.

Comments closed

Mutli-Branch Pipelines In Jenkins

Published 2018-06-27 by Kevin Feasel

Chris Adkin continues his SQL Server continuous integration series:

Whatever you elect to do there will always be a master branch, where you go from here depends on whether you favor branching or feature toggles. Wikipedia provides a nice definition of what a feature toggle is, thus:

A feature toggle^[1] (also feature switch, feature flag, feature flipper, conditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.^[2]

A branch is initially a clone of the master branch to begin with, developers work on the branch. Once the work on that branch is code complete and it has been tested to satisfaction, it is merged into the master. An overview of the branching and merging process is provided in the Git documentation here.

The continuous integration and delivery purist are not great fans of branches and prefer the ethos of integrating changes into one place to be rigidly adhered to, ergo one code branch only. However, in practice you will find that most projects have to come up with some sane branching strategy. The subject of branching is a topic in its own right, suffice it to say there is an overhead in applying changes across multiple branches and overheads involved in merging into the master branch. Therefore, there needs to be some governance and rigor applied around the number of branches in the source code repository.

Chris then shows us how to create a multi-branch pipeline with Jenkins.

Comments closed

Data Protection Principles

Published 2018-06-27 by Kevin Feasel

K. Brian Kelley gives us an overview of what database security entails:

We have to be sure we know what accesses our data. There isn’t a technical solution that can automatically give us the answer. We can’t run a PowerShell script and know immediately everything that hits our key financial database. Over time we can collect that information, but the key word is “time.” If I look today, and today is not quarter end, then I don’t see the quarter end processes. If we’re looking at our HR related databases, then we really don’t know everything unless we also take into account the annual enrollment period.

The only way to be able to follow the principle of least privilege correctly is to know who and what access our data. This also includes ad hoc access, like folks running reports through SQL Server Reporting Services (SSRS) or doing analysis through Microsoft Excel. Therefore, in order to improve our data protection, we have to understand what accesses that data.

Obviously, documentation is required. When we have documentation there’s always the problem with keeping that documentation updated. While there are tools available, this task ultimately falls to people. Realistically, this is a battle we will always have to fight. Taking time to update documentation means we take time from other efforts. However, if we want to be serious about data protection, we have to know what accesses that data in order to be able to protect it.

It’s interesting to contrast this with Alex Yates’s essay on the topic.

Comments closed

Using DAX To Calculate Dates Between Transactions

Published 2018-06-27 by Kevin Feasel

Matt Allington needs a haircut:

I got my haircut today (pretty spiffy one too, even if I do say so myself). While I was chatting I asked my hair dresser “on average, how often should I get my hair cut”? She told me (for men) around 4-6 weeks. Then I got thinking (as I do), I wonder if I could data-mine my credit card data using Power BI and find out how often I actually get my own hair cut. It turns out I was able to do this, and this article explains the hardest part of that task – find the number of days between two transaction dates using DAX.

I’d probably end up doing this in SQL with the LAG function, but it’s good to know several ways to solve date difference problems.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31