Press "Enter" to skip to content

Month: February 2018

The SQL Server Backup Survey

Mike Fal ran a survey recently and shares his findings:

This leads me to last week. In order to have some data, I decided to run an informal backup survey targeted at the SQL community. The results floored me: 344 of you decided to take my short survey. This really helps me understand some of the trends out there and now I want to share those results with you.

Before I get started, I want to first thank each and every person who responded from the bottom of my heart. This data is the result of your participation. Secondly, I want to underscore the “informal” nature of this. There’s a lot of holes that can probably be poked in the process, but I think the data is still useful and can give people insight into the trends.

I’ve posted raw data along with a few tools out on GitHub, where you are welcome to download and play with it.

Check out Mike’s findings and then dig into the data on GitHub.

Comments closed

Performing A Database Migration With dbatools

Viorel Ciucu has a Powershell script using dbatools to configure a new SQL Server instance and migrate databases using TDE over to the new instance:

The second option (and the one we chose) was to leave the encryption enabled. In order to be able to attach the files, or to do restores from the backups you need to have the same certificate that was used for encryption. This certificate is protected by the master key.

To accomplish this:

  1. Make backups of the master key and the certificates

  2. Restore the key and certificates on the new principal and mirror pairs

Read on for the process.

Comments closed

Finding Long-Running Transactions

David Fowler has a dream:

It was 3am in the morning and I was asleep and enjoying a delightful dream (I knew it was a dream because I was surrounded by drifting clouds, singing angels and hundreds of softly humming SQL Servers where the hardware had been sensibly provisioned and all code carefully optimised) when I was rudely awoken by a Service Desk call informing me that the systems were unresponsive.  A quick check and I could see that everything was being blocked a particular transaction.  My suspicion was that someone had run a script which had opened a transaction and then toddled off home without checking that either the script had finished or closed the transaction that it had opened.

My guess was right and killing the transaction got the cogs turning again.

Click through for the script.

Comments closed

User-Defined Functions In KSQL

Kai Waehner demonstrates building a user-defined function for Kafka Streams:

As you can see, the full implementation is just a few lines of Java code. In general, you need to implement the logic between receiving input and returning output of the UDF in the evaluate()method. You also need to implement exception handling (e.g. invalid input arguments) where applicable. The init() method is empty in this case, but could initialise any required object instances.

Note that this UDF has state: dateFormat can be null or already initialized. However, no worries. You do not have to manage the scope as Kafka Streams (and therefore KSQL) threads are independent of each other. So this won’t cause any issues.

Click through for the entire process.

Comments closed

Labels And Annotations In ggplot2

I have another post in my ggplot2 series:

Annotations are useful for marking out important comments in your visual.  For example, going back to our wealth and longevity chart, there was a group of Asian countries with extremely high GDP but relatively low average life expectancy.  I’d like to call out that section of the visual and will use an annotation to do so.  To do this, I use the annotate() function.  In this case, I’m going to create a text annotation as well as a rectangle annotation so you can see exactly the points I mean.

By this point, we’re getting closer and closer to high-quality graphics.

Comments closed

The Year Of The Data Engineer

Alex Woodie points out that data science also requires data engineers:

The shortage of data scientists – those triple-threat types who possess advanced statistics, business, and coding skills – has been well-documented over the years. But increasingly, businesses are facing a shortage of another key individual on the big data team who’s critical to achieving success – the data engineer.

Data engineers are experts in designing, building, and maintaining the data-based systems in support of an organization’s analytical and transactional operations. While they don’t boast the quantitative skills that a data scientist would use to, say, build a complex machine learning model, data engineers do much of the other work required to support that data science workload, such as:

  • Building data pipelines to collect data and move it into storage;

  • Preparing the data as part of an ETL or ELT process;

  • Stitching the data together with scripting languages;

  • Working with the DBA to construct data stores;

  • Ensuring the data is ready for use;

  • Using frameworks and microservices to serve data.

Read the whole thing.  My experience is that most shops looking to hire a data scientist really need to get data engineers first; otherwise, you’re wasting that high-priced data scientist’s time.  The plus side is that if you’re already a database developer, getting into data engineering is much easier than mastering statistics or neural networks.

Comments closed

Tupper’s Self-Referential Formula In Postgres

Lukas Eder has a fun post on Tupper’s self-referential formula:

Luckily, this syntax also happens to be SQL syntax, so we’re almost done. So, let’s try plotting this formula for the area of x BETWEEN 0 AND 105 and y BETWEEN k AND k + 16, where k is just some random large number, let’s say

96093937991895888497167296212785275471500433966012930665
15055192717028023952664246896428421743507181212671537827
70623355993237280874144307891325963941337723487857735749
82392662971551717371699516523289053822161240323885586618
40132355851360488286933379024914542292886670810961844960
91705183454067827731551705405381627380967602565625016981
48208341878316384911559022561000365235137034387446184837
87372381982248498634650331594100549747005931383392264972
49461751545728366702369745461014655997933798537483143786
841806593422227898388722980000748404719

Unfortunately, most SQL databases cannot handle such large numbers without any additional libraries, except for the awesome PostgreSQL, whose decimal / numeric types can handle up to 131072 digits before the decimal point and up to 16383 digits after the decimal point.

Yet again, unfortunately, even PostgreSQL by default can’t handle such precisions / scales, so we’re using a trick to expand the precision beyond what’s available by default.

Check it out.

Comments closed

Displaying Items Not Selected In A Power BI Slicer

Matt Allington tries to solve the converse of an easy problem:

My idea was that I would load school photos and also the reunion photos onto the one page.  The user can then click on a slicer with someone’s name (or any other information about people) and “see” those people highlighted in the photo.  I started thinking that I could use the excellent Synoptic Panel from The Italians for this.  The only problem I could foresee was that Synoptic Panel is designed to provide shading over an image based on what was selected.  I wanted to shade/hide those people that were NOT selected.  Anyhow, I love a challenge.

Read on for Matt’s solution.

Comments closed

Maintain MSDB

Lori Brown points out that there are some SQL Server service tables which can bloat your msdb database:

I recently received a panicked call from a client who had a SQL instance go down because the server’s C drive was full. As the guy looked he found that the msdb database file was 31 GB and was consuming all of the free space on the OS drive causing SQL to shut down. He cleaned up some other old files so that SQL would work again but did not know what to do about msdb.

As we looked at it together I found that the sysmaintplan_logdetail table was taking all the space in the database. The SQL Agent had been set to only keep about 10000 rows of history but for some unknown reason the table never removed history. After consulting MSDN I found this code did the trick for truncating this table.

Lori’s focus here is on SQL Agent history, but don’t forget about things like backup history as well.

Comments closed

Installing Docker On Linux

Mark Broadbent shows how to install Docker on Linux Mint:

A web search will almost certainly point you to lots of similar posts, mostly (if not) all of which start instructing you to add unofficial or unrecognized sources, keys etc. Therefore my intention with this post is not to replace official documentation, but to make the process as simple as possible, whilst still pointing to all the official documentation so that you can be confident you are not breaking security or other such things!

You can head over to the following Docker page Get Docker CE for Ubuntu for the initial setup and updates, but for simplicity, you can follow along below.

The installation instructions will also work for Ubuntu and other related variants.

Comments closed