Press "Enter" to skip to content

Month: July 2017

Grooming The Error Log

Mark Wilkinson explains how to keep your SQL Server error logs in check:

We typically think of error logs as somewhere to go to find issues, but what if your error logs ARE the issue? Like most anything else in SQL Server, if you neglect your error logs you can run into trouble. Even on a low-traffic SQL Server instance, a bad piece of code, or a hardware issue, could easily fill your error logs, and with the introduction of Hekaton in SQL Server 2014, the SQL Server error log started getting a lot more data pumped into it than you might have been used to before. What this means for the DBA is that you can quickly start filling your main system drive (if your SQL install and error logs are in the default location) with massive error logs. So what questions should you be answering about error logs to make sure you don’t run into problems?

Read on to learn more.

Comments closed

Wanted: Filtering Power BI Data Views

Rob Collie is rabble-rousing and building up a pitchfork mob:

I don’t know how we’ve managed to go this long without the ability to Filter in the Data View:

As simple as it sounds, this is the #1 thing I miss when I leave the Excel Power Pivot environment for Power BI Desktop.

Why do I miss it?  Debugging and Validating, primarily.  When I’m getting results from a measure that I don’t trust, it is super common for me to filter a table in Power Pivot so that I’m just looking at a specific set of rows.  And then I often filter by a second column, and then sometimes a third, in order to see if there are any unexpected surprises in the data.  Scrolling through thousands of rows looking with my eyes just doesn’t cut it.

This is positively crucial, at least for me, and creating a Table or Matrix visualization to perform the same task is such a high-friction alternative that I’m usually tempted to start over in Power Pivot.  Ugh.

In fact this is the overwhelming #1 reason why, if I’m starting a model from scratch, I try to do as much of the work as I can in Power Pivot before converting over to PBIX format.

Rob has provided a link to the Power BI Ideas forum, making it easy to vote this item up.

Comments closed

Better Grouping With dplyr

John Mount builds a function to improve upon the group-by to mutate model in dplyr:

The advantages of the shorthand are:

  • The analyst only has to specify the grouping column once.
  • The data (mtcars) enters the pipeline only once.
  • The analyst doesn’t have to start thinking about joins immediately.

Frankly I’ve never liked the shorthand. I feel it is a “magic extra” that a new user would have no way of anticipating from common use of group_by() and summarize(). I very much like the idea of wrapping this important common use case into a single verb. Adjoining “windowed” or group-calculated columns is a common and important step in analysis, and well worth having its own verb.

Below is our attempt at elevating this pattern into a packaged verb.

Click through for the script.  I’d like to see something like this make its way into dplyr.

Comments closed

Azure SQL Database Security Basics

Arun Sirpal explains some of the security features exposed in Azure SQL Database:

You do not really have to use all of them, this is down to you and your requirements but at least you have decisions to make. I will mention TDE (Transparent Data Encryption), I know a lot of people will opt for TDE in Azure. The big advantage of TDE in Azure over the earthed flavour is that Microsoft does a lot of the work for you, especially around the key management side of things. Also assuming your database is in a geo-replication partnership it will be protected by a different key on each of the servers. Microsoft will also rotate your certificate at least every 90 days, doing this with a local based SQL Server can be quite manual and fiddly (well I think it is).

Read the whole thing if you’re thinking of moving forward with Azure SQL Database, or if you already have a database up in Azure and haven’t checked the latest offerings yet.

Comments closed

Building Multiple Docker Containers At Once

Andrew Pruski shows how to use Docker compose to build multiple containers simultaneously:

As SQL Server people we’re only going to be interested in one application but that doesn’t mean we can’t use compose to our advantage.

What I’m going to do is go through the steps to spin up 5 containers running SQL Server, all listening on different ports with different sa passwords.

Bit of prep before we run any commands. I’m going to create a couple of folders on my C:\ drive that’ll hold the compose and dockerfiles: –

Andrew also explains a couple of common errors as he walks us through the process.

Comments closed

Annoying Date Formats

Randolph West shows that even The Best Date Format can deceive you under certain circumstances:

Look carefully. DATE and DATETIME2 are showing the date of 12 July 2017 as expected. Unfortunately, the DATETIME and SMALLDATETIME data types are showing a date of 7 December 2017.

That’s not good at all. It means that the ISO 8601 standard does not work the way we might expect it to. The reason is simple, if annoying: we need to add a time to the date to make it pass the ISO 8601 standard.

I don’t like the idea of having to write 20170713 instead of 2017-07-13, but that is the only date format in SQL Server that I’ve run across that will work with any language and culture settings.

2 Comments

Installing Python Support In SQL Server

Ginger Grant has a teaser for her upcoming 24 Hours of PASS talk:

The process for using Python in SQL Server is very similar to the previous process of installing R.  Microsoft renamed R Services to Machine Learning Services, and now allows both R and Python to be installed, as shown in the screen.  Microsoft’s version of Python uses Anaconda, which is an open source analytics platform created by Continuum. This is where Python differs from other open source languages, as Continuum is providing the version of Python as it contains data science components which are not included in the standard distribution of Python. Continuum also sells an enterprise version of Anaconda, with of course more features than come with the free version. It is important to remember the python environment as you will need select the same distribution when running Python code outside of SQL Server.

Read on to see how to install Python support in SQL Server 2017 and for a few links to tools.

Comments closed

Parameter Sniffing On Conditional Statements

Kendra Little explains that SQL Server will cache parameter values for invalid statements:

The first time that dbo.ReviewFlags is executed after the database comes online, it’s with an invalid parameter, like this:

  • EXEC dbo.ReviewFlags @Flag = null;
  • GO

This is caught by the IF block, hits the RAISERROR, and goes down to the THROW block, and the output is:

  • Msg 50000, Level 11, State 1, Procedure ReviewFlags, Line 8 [Batch Start Line 70]
  • @Flag must be a value between 1 and 5

But even though SQL Server didn’t execute the SELECT statement, it still compiled it. And it also cached the plan.

Read on to understand the trouble this can cause, as well as a few ways of solving the problem.  This is a special case of parameter sniffing problems, but the solutions are the same as in the general case.

Comments closed

STOP Date Formats

Dave Mason notes that the STOPAT date option when restoring a log backup is temperamental:

There’s nothing I see in the documentation regarding the format for “time“. But there are a couple of examples, including this one:

RESTORE LOG AdventureWorks  
FROM AdventureWorksBackups  
WITH FILE=4, NORECOVERY, STOPAT = 'Apr 15, 2020 12:00 AM';

That string looks suspiciously like a US English date format. I suspect that wouldn’t work for languages that don’t recognize “Apr” as a month. And what if the date is displayed in one of the many date formats used outside of the US? Lets find out!

Dave tried 21 different date formats; click through for the results.

Comments closed

Using bsts In R

Steven L. Scott explains what the bsts package does:

Time series data appear in a surprising number of applications, ranging from business, to the physical and social sciences, to health, medicine, and engineering. Forecasting (e.g. next month’s sales) is common in problems involving time series data, but explanatory models (e.g. finding drivers of sales) are also important. Time series data are having something of a moment in the tech blogs right now, with Facebook announcing their “Prophet” system for time series forecasting (Taylor and Letham 2017), and Google posting about its forecasting system in this blog (Tassone and Rohani 2017).

This post summarizes the bsts R package, a tool for fitting Bayesian structural time series models. These are a widely useful class of time series models, known in various literatures as “structural time series,” “state space models,” “Kalman filter models,” and “dynamic linear models,” among others. Though the models need not be fit using Bayesian methods, they have a Bayesian flavor and the bsts package was built to use Bayesian posterior sampling.

If you’re looking for time series models, this looks like a good one.

Comments closed