Press "Enter" to skip to content

Author: Kevin Feasel

Regular Expression Cheat Sheets

Mara Averick shows off a collection of regular expression guides:

There are helpful string-related R packages 📦, stringr (which is built on top of the more comprehensive stringi package) comes to mind. But, at some point in your computing life, you’re gonna need to get down with regular expressions.

And so, here’s a collection of some of the Regex-related links I’ve tweeted 🐦:

Click through for links to regular expression resources.

Comments closed

Visualizing A Single Number

Tim Bock shows a dozen methods for visualizing a single number:

There are a number of situations in which it can be advantageous to create a visualization to represent a single number:

  • To communicate with less numerate viewers/readers;

  • Infographics and dashboards commonly use one important number;

  • To attract the attention of distracted or busy viewers/readers;

  • To add some humanity or “color”, to create an emotional connection;

  • Or to increase the redundancy of the presentation (see Improve the Quality of Data Visualizations Using Redundancy).

To a great extent, my favorite is the first.  There are good cases for many of the others—primarily the shock value of the uncountable pictogram—but typically, the best visualization is simple.

Comments closed

Date Correlation Optimization

Monica Rathbun explains another quasi-hidden SQL Server configuration option:

According to MSDN – The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.

How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.

Read on for the English translation.

Comments closed

Dealing With Noisy Neighbors

Kevin Kline explains what Resource Governor does:

There are lots of ways to manage noisy neighbors. For example, you could spin up additional instances of SQL Server on a single physical or virtual machine (VM), and then segregate the applications to a distinct instance. You could also follow the old adage of “one application, one SQL Server” by putting the SQL Server onto its own machine, either physical or virtual. But that can get very expensive very quickly, depending on your licensing methodology.

If you’re running SQL Server 2008 or later, you might want to investigate Resource Governor as an alternative. Resource Governor lets you create limits on the amount of system resources a database and application can consume. On versions 2008 to 2012, Resource Governor can explicitly limit CPU and memory and, starting with version 2014, limit I/O consumption as well. This is powerful medicine for multi-tenant instances with noisy neighbors!

My response to noisy neighbors is to turn my music up really loud as a passive-aggressive response.  Oh, wait, wrong kind of noisy neighbor…  H/T SentryOne

Comments closed

Creating Database Snapshots

David Fowler has a script which lets you create database snapshots easily:

The procedure takes two parameters,

@DatabaseList – a comma delimited string of database names, allows wildcards
@ListOnly – 1- a list of affected databases will be displayed but snapshots aren’t created.
0- Snapshots are created automatically DEFAULT

I’m a big fan of database snapshots in development and QA environments—take a snapshot, run a workload, revert the snapshot.

Comments closed

Azure Data Lake Analytics Pipelines

Yan Li notes that Azure Data Lake Analytics now offers the ability to manage pipelines:

To make it easier to manage and understand jobs, ADLA now captures the pipeline and recurrence information for each job. This information can be used to connect and organize jobs belonging to the same pipeline or recurring instances. As shown in Fig 2, now jobs are organized by pipeline and recurring instances which enable you to:

  • Quickly identify jobs in pipelines which may have failed or taken longer than expected.

  • Get the aggregated statistics (e.g. job counts, successful and failed AU hours etc.) for a pipeline or a recurring instance

This is an interesting improvement.

Comments closed

Integrating Azure Data Catalog With Power BI

Gaston Cruz shows how to tie view Azure Data Catalog data in Power BI:

A Self Service culture will allow to address analysts to generate their own reports, lists, and dashboards without dependence on the schedule and availability of IT staff. In these cases reports combine different sources of information are generated, many of which may not have been used historically in the company, and this in turn implies that a large number of cases which source you do not know used to implement certain reports.

Azure Data Catalog comes as an option to break that cycle of discovery that is usually done manually. This means that after the first cycle where the business analyst discovers the sources of optimal data to generate certain reports the can register, and add information (metadata) to make this source easier to discover future analysts requiring such data for the implementation of similar reports. The discovery of these sources, and capability to add metadata are procedures do not have to give at the same time but Data Catalog allows work annotations by analysts as a continuous work in time where more information is added to the repository every time.

Click through for a demo.

Comments closed

Automatically Fix Those VLFs

Tracy Boggiano has a script which will fix log files with high virtual log file counts:

First part of the process if to capture the info from DBCC LOGINFO or if you are ready for 2017 the new dmv sys.dm_db_log_stats into a table you can read later to know how many VLFs exist in your database currently. So we going to create table called VLFInfo and used the procedure VLF_UpdateInfo to populate that data.  The procedure would be called in step one of a SQL Agent Job to automate the fixing of VLF files during appropriate maintenance windows on your server.  But as you will see in the Step 2 may solutions tries to account for not doing to close to when the file just grew an acquired those new lovely extra VLFs.

Read on for the code.

Comments closed

Explaining Confidence Intervals

Mala Mahadevan explains what confidence intervals are:

Suppose I look at a sampling of 100 americans who are asked if they approve of the job the supreme court is doing. Let us say for simplicity’s sake that the only two answers possible are yes or no. Out of 100, say 40% say yes. As an ordinary person, you would think 40% of people just approve. But a deeper answer would be – the true proportion of americans who approve of the job the supreme court is doing is between x% and y%.

How confident I am that it is?  About z%. (the common math used is 95%).  That is an answer that is more reflective of the uncertainty related to questioning people and taking the answers to be what is truly reflective of an opinion. The x and y values make up what is called a ‘confidence interval’.

Read the whole thing.

Comments closed