Press "Enter" to skip to content

Author: Kevin Feasel

Configuring Logstash

Mike Hillwig gets us started on Logstash:

Logstash is an incredibly powerful tool. If you can put data into a text file, Logstash can parse it. It works well with a lot of data, but I’m finding myself using it more to use it for event data. When I say event data, if it triggers a log event and it writes to a log, it’s an event. For the purposes of my demos, I’m using data from the Bureau of Transportation Statistics. They track flight performance data, which works perfectly for my uses. It’s a great example dataset without using anything related to my real job.

Logstash configuration files typically have three sections, INPUT, FILTER, and OUTPUT. However, FILTER is optional.

This is the first part in a series, so stay tuned.

Comments closed

Converting Int To Time

Bill Fellows has a pop quiz for us:

Given the following DDL

CREATE TABLE dbo.IntToTime
(
    CREATE_TIME int
);

What will be the result of issuing the following command?

ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;

Clearly, if I’m asking, it’s not what you might expect.

Click through if you have not memorized your implicit conversion tables.

Comments closed

The Power Of Window Functions

Ben Richardson has an introduction to the concept of window functions:

This is a much more efficient result. In the first line of the script the id, name and gender columns are retrieved. These columns do not contain any aggregated results.

Next, for the columns that contain aggregated results, we simply specify the aggregated function, followed by the OVER clause and then within the parenthesis we specify the PARTITION BY clause followed by the name of the column that we want our results to be partitioned as shown below.

This post focuses on normal aggregates which accept windows.  Once you understand that, there’s a wide world beyond it, and you’ll quickly see how useful window functions can be.

Comments closed

Initial Thoughts On dbachecks

Jess Pomfret has an initial use case for dbachecks:

Each check has one unique tag which basically names the check and then a number of other tags that can also be used to call a collection of checks.

For this example we are going to use several checks to ensure that we meet the following requirements:

  • Full backup once a week – using LastFullBackup
  • Differential backup once a day – using LastDiffBackup
  • Log backup every hour – using LastLogBackup

Since each of the three checks we want to run also have the LastBackup tag we can use that to call the collection of checks at once.

Jason Squires looks at this for enterprise reporting:

This module was developed and designed to ensure you can see if you have the best settings/configurations set up on your SQL systems using powershell. There are three pre-requisites that are required to load the module. Those are Pester 4.3.1, PS Framework 0.9.10.23, and currently as of this post dbatools 0.9.207. However, the team of dbachecks, kindly built in a notification for you if those modules and versions should those not be installed or would have a need to update.

What I really love about this module, is how you can utilize SQL CMS, and view the results at an enterprise reporting level.

Shane O’Neill has a bit more:

Straight away, dbachecks gives you the option to include or exclude checks that you feel aren’t for you. If you only want to run a subset of the checks, then you can specify that.

The real beauty that I think dbachecks provide is that you are getting a wealth of checks for things that you may never have thought of checking or known how to check while being able to add any personal tests as well.

Sounds like something for DBAs to check out.

Comments closed

Introducing dbachecks

Chrissy LeMaire announces that the dbatools team is onto something big:

dbachecks is a framework created by and for SQL Server pros who need to validate their environments. Basically, we all share similar checklists and mostly just the server names and RPO/RTO/etc change.

This module allows us to crowdsource our checklists using Pester tests. Such checks include:

  • Backups are being performed
  • Identity columns are not about to max out
  • Servers have access to backup paths
  • Database integrity checks are being performed and corruption does not exist
  • Disk space is not about to run out
  • All enabled jobs have succeeded
  • Network latency does not exceed a specified threshold

We currently provide over 80 checks

Chrissy also shows you how to install dbachecks and explains the commands.

This is the biggest community-driven news since, well, dbatools…

Comments closed

Methods To Improve Model Accuracy

Tristan Robinson shows how to go back to the drawing board when your model’s accuracy isn’t cutting it:

One of the reoccurring principles that appears with machine learning is that of Ockham’s razor, which states that the best models are simple models that fit the data well; this is not an irrefutable principle of logic, but a preference for simplicity. Therefore there is a need of balance between accuracy and simplicity to limit the feature set which tends to lead to better predictions. Simpler models are also more interpretable to humans which also helps. While the data I was working with was limited to around 35 features, there are many data science problems which have thousands of features and so this technique is even more crucial.

There are multiple methods to perform feature selection, of which a few will be covered here. The first method is greedy backward selection which starts with all the features and then finds the feature that hurts predictive power the least when removed, and you remove it. This is done iteratively until a point is met (which will be discussed later). Its known as greedy since it never looks back after removing the feature each time.

An alternative method is greedy forward selection which is basically the inverse, starts with no features, and looks for the feature that by itself is the best model. This then carries on in a similar vein to the backward selection but adding features. The point at which you stop with forward selection is that of diminishing returns for your accuracy.

Read the whole thing.  This is explanation rather than demonstration, but the explanation applies to pretty much any implementation you’re using.

Comments closed

JupyterLab Now Available

Project Jupyter announces the general availability of JupyterLab:

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

JupyterLab provides a high level of integration between notebooks, documents, and activities:

  • Drag-and-drop to reorder notebook cells and copy them between notebooks.

  • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).

  • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.

  • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

I like this, as I’m a big fan of notebooks but sometimes you just want to write some diagnostic queries and an IDE is way better for that. H/T Giovanni Lanzani

Comments closed

Normalizing To Boyce-Codd Normal Form

I am a big fan of Boyce-Codd Normal Form:

Boyce-Codd Normal Form is a generalization of Second and Third Normal Forms.  There are a couple of requirements to be in Boyce-Codd Normal Form.  First, your table must be in First Normal Form.  This means that:

  • Every entity (row) has a consistent shape.  This is something relational databases do for you automatically:  you can’t create a table where one entity has an attribute (column) but the next entity doesn’t.
  • Every entity has a unique value.  You can uniquely identify any particular row.
  • Every attribute is atomic:  you don’t try to pack more than one value into a single attribute.
  • There are no repeating groups of attributes, like PaymentMethod1, PaymentMethod2, PaymentMethod3, etc.

The other half of BCNF is that every determinant on an entity is a key.

Also click through for an iterative, easy-to-follow process to get to BCNF.

Comments closed

Using psake To Load FunctionsToExport

Cody Konior has a good post on using psake to populate the FunctionsToExport section of a module definition:

The “best practice” and proper way of handling this then is to add an entry to the array in this file every time you create a new function. Manually. What a pain in the ass right? But there’s a better way!

If you haven’t seen psake before you can take a look later. Start up PowerShell as Administrator and install psake from the PowerShell Gallery using Install-Module psake and then you’re ready to go.

How does psake work? It’s the PowerShell equivalent of a Makefile. Basically:

  • You add a file in the root of your module called psakefile.ps1

  • When you run Invoke-psake from that location, it will load and execute the file

For a one-off module with one or two functions, it’s probably not worth it, but once you get to several functions (or if you’re building modules regularly), this looks like a time-saver.

Comments closed

See The Pernicious Effects Of Your UDFs

Pedro Lopes announces an improvement to SQL Server execution plan results in 2017 CU3:

As I mentioned on yesterday’s post, with the recent release of SQL Server 2017 CU3, we released yet more showplan enhancements: you can see other posts related to showplan enhancements here.

In this article I’ll talk about the second showplan improvement we worked on, to assist in the discoverability of UDF usage impact on query execution.

The scenario is that if a query uses one or more user-defined scalar functions (such as T-SQL functions or CLR functions), a significant portion of query execution time may be spent inside those functions, depending on the function definition. And this may be not immediately obvious by looking at the query execution plan.

Recently, we added information on overall query CPU and elapsed time tracking for statistics showplan xml (both in ms), found in the root node of an actual plan (on which I blogged about here). We now added two new attributes: UdfCpuTime and UdfElapsedTime. These provide the total CPU and elapsed time (again, both in ms) that is spent inside all scalar user-defined functions, during the execution of a query.

I love it.  UDFs have historically been silent query killers, as the execution plan would gleefully think that the function call is practically free because it’d only show a single iteration.

Comments closed