Press "Enter" to skip to content

Day: December 15, 2016

SQL As A Declarative Language

Lukas Eder discusses one benefit to a declarative language like SQL:

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

It may be my innate contrarian curmudgeonliness, but I am moving more and more toward the idea that the easiest way to deal with data is a combination of SQL and functional programming languages, leaving OO out of the picture.

Comments closed

SQL Agent Alerts

David Alcock has a script to create SQL Agent alerts for common errors:

These alerts cover a range of errors from potential IO subsystem problems to failed logins, all of which are things a DBA needs to know about, and quickly too.
As well as error notifications you can set up alerts to cover performance conditions. The final statement in the script below sets up an alert that triggers when Page Life Expectancy drops below 1000. In all honesty I don’t set up these performance alerts that often but I wanted to show you the kind of thing that is possible and would be handy if you don’t have any third party monitoring.

He follows this up with a post on appropriate response:

But what do I mean by sensible? Typically I see a number of problems with alerting setups; either alerts are inadequate and don’t cover the necessary errors (or there are none at all) but I also see the notifications to alerts not being set up correctly meaning problems go backwards and forwards delaying any fixes.
The other problem I see is an over provision of alerts. This usually is because one or more other monitoring systems have been deployed and error notifications have been duplicated as a result. Imagine having an operational tool like System Centre, some SQL monitoring software and native alerting all pinging the same message to the one recipient mailbox. Now on top of that let’s say the alerts have not been configured correctly so information emails are being issued every second. It’s a scary thought but it is easy to see how a critical error might be missed in this scenario.

If you don’t have automatic alerts for high-severity errors, this is an easy way of gaining insight into the problems your server is experiencing.

Comments closed

Ring Buffers

Juho Snellman explains ring buffers:

This is of course not a new invention. The earliest instance I could find with a bit of searching was from 2004, with Andrew Morton mentioning in it a code review so casually that it seems to have been a well established trick. But the vast majority of implementations I looked at do not do this.

So here’s the question: Why do people use the version that’s inferior and more complicated? I’ve must have written a dozen ring buffers over the years, and before being forced to really think about it, I’d always just used the first definition. I can understand why a textbook wouldn’t take advantage of unsigned integer wraparound. But it seems like it should be exactly the kind of cleverness that hackers would relish using and passing on.

Check out the comments for more information, a bit of code golf, and multiple links on tying shoelaces.

Comments closed

Analyzing Taxi Data With Microsoft R Server

Ali Zaidi builds a Spark cluster to analyze 1.1 billion taxi cab rides using Microsoft R Server:

In a similar spirit to how sparklyr allowed us to reuse our functions from the dplyr package to manipulate Spark DataFrames, the RxSpark API allows a data scientist to develop code that can be deployed in a multitude of environments. This allows the developer to shift their focus from writing code that’s specific to a certain environment, and instead focus on the complex analysis of their data science problem. We call this flexibility Write Once, Deploy Anywhere, or WODA for the acronym lovers.

For a deeper dive into the RevoScaleR package, I recommend you take a look at the online course, Analyzing Big Data with Microsoft R Server. Much of this blogpost follows along the last section of the course, on deployment to Spark.

R isn’t just for small, one-off jobs anymore.

Comments closed

Transaction Log Operations And Backups

John Deardurff explains what happens in the transaction log when you restore a backup:

In the example, the database performed a checkpoint at noon and a backup had been taken at that time. The restore process will capture all the transactions up until the point the database had been backed up. After the database has been restored, the recovery process will roll forward transactions 2 and 4 because they had been committed to the transaction log before the point of failure. Since transactions 3 and 5 did not commit before the time of system failure, the undo process will roll back the transactions to keep the data in a consistent state.

Read the whole thing.

Comments closed

Streaming Data With Kinesis

Asaaf Mentzer shows how to join streaming data (specifically, AWS Kinesis) with lookup data:

In this use case, Amazon Kinesis Analytics can be used to define a reference data input on S3, and use S3 for enriching a streaming data source.

For example, bike share systems around the world can publish data files about available bikes and docks, at each station, in real time.  On bike-share system data feeds that follow the General Bikeshare Feed Specification (GBFS), there is a reference dataset that contains a static list of all stations, their capacities, and locations.

There are three different architectures in here, so if you’re looking for streaming data models with Kinesis (or want to apply them to Kafka), this is a solid read.

Comments closed

Dial Gauge

Devin Knight explains the dial gauge custom visual:

  • The effectiveness of gauges on dashboards is an often debated topic.

  • The Dial Gauge is completely data driven. Which means not only must your measure (drives the needle) come from a dataset but also the different thresholds ranges must come from your dataset too.

  • There are no specific Format settings for the Dial Gauge, which does limit you a bit with what you can do with this gauge.

There are certain scenarios in which I think the dial gauge works well.  The best scenario is the the same as its analog counterpart:  when you are measuring a single continuous variable with a safe range and meaningful range differences.  This scenario occurs less often than you might think.

Comments closed

Cannot Connect To WMI Provider

Andrew Peterson troubleshoots an error after installing SSMS vNext:

After installing SQL Server Management Studio for vNext, the Configuration Manager no longer opens, with a message similar to the following:

Cannot connect to WMI provider. You do not have permission or the server is unreachable. Note that you can only manage SQL Server 2005 and later servers with SQL Server Configuration Manager.
Invalid namespace [0x8004100e]

Read on for the solution.

Comments closed

Backup Basics

Aaron Bertrand covers reasons for backups, backup models, and also a vital part of the backup process:

Now, all of the above may be review for you, but a much more important part of this story is that you need to be TESTING your backups. I’ve seen many customers who have been happily taking backups and storing them on some drive somewhere, and then when disaster strikes and they actually need to restore them, they can’t – maybe they had been backing up corruption all along, or the backups were failing but they were ignoring alerts, or they weren’t taking log backups frequently enough to meet their RPO, or they were only taking full backups.

Testing backups is vital; just because the backup process reported success doesn’t mean that you’ll necessarily be able to restore that backup when the time comes that you need it.  It’s also good to drill people on restoration skills, as things get a bit more stressful when three levels of management are standing behind your chair asking you what’s taking so long.

Comments closed

Backing Up Extended Event Logs

Wayne Sheffield reminds us that backups aren’t just for databases:

So how does this talk of AGs pertain to this T-SQL Tuesday topic? It should be pretty obvious – we need to periodically grab all of the .xel files generated by the cluster, and move them to a different directory, with a different retention policy. Yup… we need to back up these files. Sometimes, we need to be backing up things other than the databases themselves.

I created a PowerShell script that takes a few parameters, then moves the files from the source directory to the destination directory. And then it deletes files from the destination directory that are over x days old.

Wayne goes into more detail, including permissions required to run the script.

Comments closed