Press "Enter" to skip to content

Day: June 27, 2018

Counting Arguments In R

Neil Saunders shares methods for interrogating argument lists in R:

“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?”

It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there.

There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they?

It’s a fun exercise and helpful for learning a bit more about how to work with arguments when metaprogramming in R.

Comments closed

Analyzing Federal Reserve Data With Ordinary Least Squares

Sam Shum has a tutorial walking us through extracting and analyzing data from the St. Louis Federal Reserve’s FRED economic database:

Download specific macroeconomic data from FRED St. Louis economic databases and ETL the data. Many other data series can be found at the FRED’s website.

# get unemployment data time series from FRED St. Louis
dfunrate <- get_fred_series("UNRATE", "unrate", observation_start = startdate, observation_end = enddate)

# get University of Michigan consumer sentiment index data time series from FRED St. Louis
dfumcsent <- get_fred_series("UMCSENT", "umcsent", observation_start = startdate, observation_end = enddate)

# combine the two time series data into one data frame
dfall <- cbind(dfunrate,dfumcsent)

# strip or remove redundant month field from data downloaded from FRED St. Louis
dfall <- dfall[,c(1,2,4)]

# obtain the number of data points in the dataframe
mdx <- (1:nrow(dfall))  

# convert FRED date field from string to R's date type
dfall$date <- as.Date(dfall$date)

There’s a nice chart builder on the FRED website too, but it’s good to be able to grab the data on your own.

Comments closed

Bug When Importing Packages In BimlExpress 2018

Andy Leonard reports a bug as well as a temporary workaround for BimlExpress 2018:

I had no sooner published my blog post about the coolness of Biml 2018 when I encountered a bug trying to use one of the features I really like – converting SSIS packages to Biml using (FREE!) BimlExpress 2018.

My first response was, “Durnit! This worked in the test versions.” My second response was to drop a note into an issue-tracking site Varigence set up to record these kinds of things.

And then I started getting emails similar to, “Hey Andy, I get this error when I try to use the new ‘Convert SSIS Packages to Biml’ feature”…

David Stein has a workaround for us until Varigence can fix the bug.

Comments closed

Mutli-Branch Pipelines In Jenkins

Chris Adkin continues his SQL Server continuous integration series:

Whatever you elect to do there will always be a master branch, where you go from here depends on whether you favor branching or feature toggles. Wikipedia provides a nice definition of what a feature toggle is, thus:

feature toggle[1] (also feature switchfeature flagfeature flipperconditional feature, etc.) is a technique in software development that attempts to provide an alternative to maintaining multiple source-code branches (known as feature branches), such that a feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during run time. For example, during the development process, a developer can enable the feature for testing and disable it for other users.[2]

A branch is initially a clone of the master branch to begin with, developers work on the branch. Once the work on that branch is code complete and it has been tested to satisfaction, it is merged into the master. An overview of the branching and merging process is provided in the Git documentation here.

The continuous integration and delivery purist are not great fans of branches and prefer the ethos of integrating changes into one place to be rigidly adhered to, ergo one code branch only. However, in practice you will find that most projects have to come up with some sane branching strategy. The subject of branching is a topic in its own right, suffice it to say there is an overhead in applying changes across multiple branches and overheads involved in merging into the master branch. Therefore, there needs to be some governance and rigor applied around the number of branches in the source code repository.

Chris then shows us how to create a multi-branch pipeline with Jenkins.

Comments closed

Data Protection Principles

K. Brian Kelley gives us an overview of what database security entails:

We have to be sure we know what accesses our data. There isn’t a technical solution that can automatically give us the answer. We can’t run a PowerShell script and know immediately everything that hits our key financial database. Over time we can collect that information, but the key word is “time.” If I look today, and today is not quarter end, then I don’t see the quarter end processes. If we’re looking at our HR related databases, then we really don’t know everything unless we also take into account the annual enrollment period.

The only way to be able to follow the principle of least privilege correctly is to know who and what access our data. This also includes ad hoc access, like folks running reports through SQL Server Reporting Services (SSRS) or doing analysis through Microsoft Excel. Therefore, in order to improve our data protection, we have to understand what accesses that data.

Obviously, documentation is required. When we have documentation there’s always the problem with keeping that documentation updated. While there are tools available, this task ultimately falls to people. Realistically, this is a battle we will always have to fight. Taking time to update documentation means we take time from other efforts. However, if we want to be serious about data protection, we have to know what accesses that data in order to be able to protect it.

It’s interesting to contrast this with Alex Yates’s essay on the topic.

Comments closed

Using DAX To Calculate Dates Between Transactions

Matt Allington needs a haircut:

I got my haircut today (pretty spiffy one too, even if I do say so myself).  While I was chatting I asked my hair dresser “on average, how often should I get my hair cut”?  She told me (for men) around 4-6 weeks.  Then I got thinking (as I do), I wonder if I could data-mine my credit card data using Power BI and find out how often I actually get my own hair cut.  It turns out I was able to do this, and this article explains the hardest part of that task – find the number of days between two transaction dates using DAX.

I’d probably end up doing this in SQL with the LAG function, but it’s good to know several ways to solve date difference problems.

Comments closed

Missing KB2919355 When Installing SQL Server

Ryan Allport explains how to install SQL Server 2016 on Windows Server 2012 R2 when you get the Rule “KB2919355 Installation” failed error message:

As you can see, the upgrade feature rules check failed around the KB2919355 installation. At this point, reading the error message, I assumed (I know, I know, it’s something we should never do as a DBA!) that the patch had been downloaded and applied during the latest round of Windows patching, and all that was required was a server reboot. I was wrong.

Upon running the upgrade again, I got the same error message. Hmm, annoying. So, after some Googling I was confident I knew what to do to resolve this; download and install the KB2919355 patch. So, I downloaded the patch from the official Microsoft website (KB2919355) and kicked off the installation.

There’s a bit more to it than “install the patch.”

Comments closed

Optimizing Replication Agent Parameters

Sourabh Agarwal shows us a couple of levers we have in improving replication performance:

When SQL Server replication is used on environments with high traffic OLTP systems, users often need to adjust the agent profile parameters to increase the throughput of the log reader and distribution agents to keep up with the workload. We recently performed a series of tests to measure the performance of log reader and distribution agents while changing some of the parameters for these agents. This blog summarizes the outcomes and conclusions from this testing.

Read on for the relevant parameters.

Comments closed