Web Analytics With R

Kevin Feasel

2018-02-22

R

Maelle Salmon performs some analysis on the Locke Data blog:

Often, the URL of a blog post can be guessed based on its title, e.g. this one can be read here. But even if the transition from the Markdown file information to an URL is logical, it was best to get URLs from the in situ blog posts, and then join them to the blog post information collected previously, since some special characters got special treatment that I could not fully understand by looking at blogdown source code.

I first extracted all posts URLs from the website map.

Check it out.

Using stringr To Remove HTML

Kevin Feasel

2018-02-22

R

I have a quick post on removing HTML markup with stringr:

This is a quick post today on removing HTML tags using the stringr package in R.

My purpose here is in taking some raw data, which can include HTML markup, and preparing it for a vectorizer.  I don’t need the resulting output to look pretty; I just want to get rid of the HTML characters.

Click through for the script.  If you need to do something nice with the text afterward, my technique is probably too much sledgehammer for niceties, but it does the trick for pre-processing before vectorization.

The DATETIME Type In SQL Server

Randolph West gets into the DATETIME data type:

DATETIME is an eight-byte datatype which stores both a date and time in one column, with an accuracy of three milliseconds. As we’ll see though, the distribution of this granularity may not be exactly what we’d expect.

Valid DATETIME values are January 1, 1753 00:00:00.000, through December 31, 9999 23:59:59.997. On older databases designed prior to SQL Server 2008, because there was no explicit support for date values, it was sometimes customary to leave off the time portion of a DATETIME value, and have it default to midnight on that morning. So for example today would be stored as February 21, 2018 00:00:00.000.

If you’re not particularly familiar with SQL Server data types, this is detailed enough information to get you going and to explain exactly why you shouldn’t use DATETIME anymore…

Configuring Logstash

Mike Hillwig gets us started on Logstash:

Logstash is an incredibly powerful tool. If you can put data into a text file, Logstash can parse it. It works well with a lot of data, but I’m finding myself using it more to use it for event data. When I say event data, if it triggers a log event and it writes to a log, it’s an event. For the purposes of my demos, I’m using data from the Bureau of Transportation Statistics. They track flight performance data, which works perfectly for my uses. It’s a great example dataset without using anything related to my real job.

Logstash configuration files typically have three sections, INPUT, FILTER, and OUTPUT. However, FILTER is optional.

This is the first part in a series, so stay tuned.

Converting Int To Time

Bill Fellows has a pop quiz for us:

Given the following DDL

CREATE TABLE dbo.IntToTime
( CREATE_TIME int
);

What will be the result of issuing the following command?

ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;

Clearly, if I’m asking, it’s not what you might expect.

Click through if you have not memorized your implicit conversion tables.

The Power Of Window Functions

Kevin Feasel

2018-02-22

Syntax

Ben Richardson has an introduction to the concept of window functions:

This is a much more efficient result. In the first line of the script the id, name and gender columns are retrieved. These columns do not contain any aggregated results.

Next, for the columns that contain aggregated results, we simply specify the aggregated function, followed by the OVER clause and then within the parenthesis we specify the PARTITION BY clause followed by the name of the column that we want our results to be partitioned as shown below.

This post focuses on normal aggregates which accept windows.  Once you understand that, there’s a wide world beyond it, and you’ll quickly see how useful window functions can be.

Initial Thoughts On dbachecks

Jess Pomfret has an initial use case for dbachecks:

Each check has one unique tag which basically names the check and then a number of other tags that can also be used to call a collection of checks.

For this example we are going to use several checks to ensure that we meet the following requirements:

  • Full backup once a week – using LastFullBackup
  • Differential backup once a day – using LastDiffBackup
  • Log backup every hour – using LastLogBackup

Since each of the three checks we want to run also have the LastBackup tag we can use that to call the collection of checks at once.

Jason Squires looks at this for enterprise reporting:

This module was developed and designed to ensure you can see if you have the best settings/configurations set up on your SQL systems using powershell. There are three pre-requisites that are required to load the module. Those are Pester 4.3.1, PS Framework 0.9.10.23, and currently as of this post dbatools 0.9.207. However, the team of dbachecks, kindly built in a notification for you if those modules and versions should those not be installed or would have a need to update.

What I really love about this module, is how you can utilize SQL CMS, and view the results at an enterprise reporting level.

Shane O’Neill has a bit more:

Straight away, dbachecks gives you the option to include or exclude checks that you feel aren’t for you. If you only want to run a subset of the checks, then you can specify that.

The real beauty that I think dbachecks provide is that you are getting a wealth of checks for things that you may never have thought of checking or known how to check while being able to add any personal tests as well.

Sounds like something for DBAs to check out.

Introducing dbachecks

Chrissy LeMaire announces that the dbatools team is onto something big:

dbachecks is a framework created by and for SQL Server pros who need to validate their environments. Basically, we all share similar checklists and mostly just the server names and RPO/RTO/etc change.

This module allows us to crowdsource our checklists using Pester tests. Such checks include:

  • Backups are being performed
  • Identity columns are not about to max out
  • Servers have access to backup paths
  • Database integrity checks are being performed and corruption does not exist
  • Disk space is not about to run out
  • All enabled jobs have succeeded
  • Network latency does not exceed a specified threshold

We currently provide over 80 checks

Chrissy also shows you how to install dbachecks and explains the commands.

This is the biggest community-driven news since, well, dbatools…

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728