Press "Enter" to skip to content

Month: February 2018

Initial Thoughts On dbachecks

Jess Pomfret has an initial use case for dbachecks:

Each check has one unique tag which basically names the check and then a number of other tags that can also be used to call a collection of checks.

For this example we are going to use several checks to ensure that we meet the following requirements:

  • Full backup once a week – using LastFullBackup
  • Differential backup once a day – using LastDiffBackup
  • Log backup every hour – using LastLogBackup

Since each of the three checks we want to run also have the LastBackup tag we can use that to call the collection of checks at once.

Jason Squires looks at this for enterprise reporting:

This module was developed and designed to ensure you can see if you have the best settings/configurations set up on your SQL systems using powershell. There are three pre-requisites that are required to load the module. Those are Pester 4.3.1, PS Framework 0.9.10.23, and currently as of this post dbatools 0.9.207. However, the team of dbachecks, kindly built in a notification for you if those modules and versions should those not be installed or would have a need to update.

What I really love about this module, is how you can utilize SQL CMS, and view the results at an enterprise reporting level.

Shane O’Neill has a bit more:

Straight away, dbachecks gives you the option to include or exclude checks that you feel aren’t for you. If you only want to run a subset of the checks, then you can specify that.

The real beauty that I think dbachecks provide is that you are getting a wealth of checks for things that you may never have thought of checking or known how to check while being able to add any personal tests as well.

Sounds like something for DBAs to check out.

Comments closed

Introducing dbachecks

Chrissy LeMaire announces that the dbatools team is onto something big:

dbachecks is a framework created by and for SQL Server pros who need to validate their environments. Basically, we all share similar checklists and mostly just the server names and RPO/RTO/etc change.

This module allows us to crowdsource our checklists using Pester tests. Such checks include:

  • Backups are being performed
  • Identity columns are not about to max out
  • Servers have access to backup paths
  • Database integrity checks are being performed and corruption does not exist
  • Disk space is not about to run out
  • All enabled jobs have succeeded
  • Network latency does not exceed a specified threshold

We currently provide over 80 checks

Chrissy also shows you how to install dbachecks and explains the commands.

This is the biggest community-driven news since, well, dbatools…

Comments closed

Methods To Improve Model Accuracy

Tristan Robinson shows how to go back to the drawing board when your model’s accuracy isn’t cutting it:

One of the reoccurring principles that appears with machine learning is that of Ockham’s razor, which states that the best models are simple models that fit the data well; this is not an irrefutable principle of logic, but a preference for simplicity. Therefore there is a need of balance between accuracy and simplicity to limit the feature set which tends to lead to better predictions. Simpler models are also more interpretable to humans which also helps. While the data I was working with was limited to around 35 features, there are many data science problems which have thousands of features and so this technique is even more crucial.

There are multiple methods to perform feature selection, of which a few will be covered here. The first method is greedy backward selection which starts with all the features and then finds the feature that hurts predictive power the least when removed, and you remove it. This is done iteratively until a point is met (which will be discussed later). Its known as greedy since it never looks back after removing the feature each time.

An alternative method is greedy forward selection which is basically the inverse, starts with no features, and looks for the feature that by itself is the best model. This then carries on in a similar vein to the backward selection but adding features. The point at which you stop with forward selection is that of diminishing returns for your accuracy.

Read the whole thing.  This is explanation rather than demonstration, but the explanation applies to pretty much any implementation you’re using.

Comments closed

JupyterLab Now Available

Project Jupyter announces the general availability of JupyterLab:

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

JupyterLab provides a high level of integration between notebooks, documents, and activities:

  • Drag-and-drop to reorder notebook cells and copy them between notebooks.

  • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).

  • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.

  • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

I like this, as I’m a big fan of notebooks but sometimes you just want to write some diagnostic queries and an IDE is way better for that. H/T Giovanni Lanzani

Comments closed

Normalizing To Boyce-Codd Normal Form

I am a big fan of Boyce-Codd Normal Form:

Boyce-Codd Normal Form is a generalization of Second and Third Normal Forms.  There are a couple of requirements to be in Boyce-Codd Normal Form.  First, your table must be in First Normal Form.  This means that:

  • Every entity (row) has a consistent shape.  This is something relational databases do for you automatically:  you can’t create a table where one entity has an attribute (column) but the next entity doesn’t.
  • Every entity has a unique value.  You can uniquely identify any particular row.
  • Every attribute is atomic:  you don’t try to pack more than one value into a single attribute.
  • There are no repeating groups of attributes, like PaymentMethod1, PaymentMethod2, PaymentMethod3, etc.

The other half of BCNF is that every determinant on an entity is a key.

Also click through for an iterative, easy-to-follow process to get to BCNF.

Comments closed

Using psake To Load FunctionsToExport

Cody Konior has a good post on using psake to populate the FunctionsToExport section of a module definition:

The “best practice” and proper way of handling this then is to add an entry to the array in this file every time you create a new function. Manually. What a pain in the ass right? But there’s a better way!

If you haven’t seen psake before you can take a look later. Start up PowerShell as Administrator and install psake from the PowerShell Gallery using Install-Module psake and then you’re ready to go.

How does psake work? It’s the PowerShell equivalent of a Makefile. Basically:

  • You add a file in the root of your module called psakefile.ps1

  • When you run Invoke-psake from that location, it will load and execute the file

For a one-off module with one or two functions, it’s probably not worth it, but once you get to several functions (or if you’re building modules regularly), this looks like a time-saver.

Comments closed

See The Pernicious Effects Of Your UDFs

Pedro Lopes announces an improvement to SQL Server execution plan results in 2017 CU3:

As I mentioned on yesterday’s post, with the recent release of SQL Server 2017 CU3, we released yet more showplan enhancements: you can see other posts related to showplan enhancements here.

In this article I’ll talk about the second showplan improvement we worked on, to assist in the discoverability of UDF usage impact on query execution.

The scenario is that if a query uses one or more user-defined scalar functions (such as T-SQL functions or CLR functions), a significant portion of query execution time may be spent inside those functions, depending on the function definition. And this may be not immediately obvious by looking at the query execution plan.

Recently, we added information on overall query CPU and elapsed time tracking for statistics showplan xml (both in ms), found in the root node of an actual plan (on which I blogged about here). We now added two new attributes: UdfCpuTime and UdfElapsedTime. These provide the total CPU and elapsed time (again, both in ms) that is spent inside all scalar user-defined functions, during the execution of a query.

I love it.  UDFs have historically been silent query killers, as the execution plan would gleefully think that the function call is practically free because it’d only show a single iteration.

Comments closed

When UNION ALL Can Beat OR

Bert Wagner compares a couple methods for writing a query:

Suddenly those key-lookups become too expensive for SQL Server and the query optimizer thinks it’ll be faster to just scan the entire clustered index.

In general this makes sense; SQL Server tries to pick plans that are good enough in most scenarios, and in general I think it chooses wisely.

However, sometimes SQL Server doesn’t pick great plans. Sometimes the plans it picks are downright terrible.

If that particular topic is interesting, I’ve a blog post from a few years back on a similar vein.

Comments closed

Data Discovery And Classification In SQL Server

Gilad Mittelman explains how the SQL Information Protection (aka Data Discovery and Classification) process works in SQL Server and Azure SQL Database:

SQL Information Protection (SQL IP) introduces a set of advanced services and new SQL capabilities, forming a new information protection paradigm in SQL aimed at protecting the data, not just the database:

  • Discovery & recommendations – The classification engine scans your database and identifies columns containing potentially sensitive data. It then provides you an easy way to review and apply the appropriate classification recommendations via the Azure portal.

  • Labeling – Sensitivity classification labels can be persistently tagged on columns using new classification metadata attributes introduced into the SQL Engine. This metadata can then be utilized for advanced sensitivity-based auditing and protection scenarios.

  • Monitoring/Auditing – Sensitivity of the query result set is calculated in real time and used for auditing access to sensitive data (currently in Azure SQL DB only).

  • Visibility – The database classification state can be viewed in a detailed dashboard in the portal. Additionally, you can download a report (in Excel format) to be used for compliance & auditing purposes, as well as other needs.

Check it out, especially with GDPR breathing down our necks.

Comments closed

SSMS 17.5 Released

Alan Yu announces SQL Server Management Studio 17.5:

SSMS 17.5 provides support for almost all feature areas on SQL Server 2008 through the latest SQL Server 2017, which is now generally available.

In addition to enhancements and bug fixes, SSMS 17.5 comes with several exciting new features:

  • Data Discovery and Classification is a new feature for discovering, classifying, labeling and reporting sensitive data in your databases.
  • Query Editor now has support for a SkipsRows option for the Delimited Text External File Format for Azure SQL Data Warehouse.
  • Showplan has enabled the display of estimated plan button for SQL Data Warehouse.

SSMS 17.5 also includes key bug fixes to Showplan, Query Editor, SMO, and Templates, which can be found in the Release Notes.

Aside from data discovery and classification, it’s probably not a major update for most people, but this does look like an interesting tool.

Comments closed