Spring Cleaning

Tara Kizer gives some advice on proactive administrative measures:

If a database has a high number of Virtual Log Files (VLFs), it can impact the the speed of transaction log backups and database recovery. I once had a database with 75,000 VLFs. I didn’t even know what a VLF was at the time (hence having so many). After rebooting the server, a mission critical database with extremely high SLAs took 45 minutes to complete recovery. We were in the process of opening a support case with Microsoft when the database finally came online. The next day, I contacted a Microsoft engineer and learned about VLFs.

There are a number of helpful tidbits here, so check it out.

Crime Analysis

Kevin Feasel

2016-05-26

R, Spark

Raghavan Madabusi combines Zeppelin, R, and Spark to perform crime analysis:

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R.

This links to a rather long post on how to set up and use all of these pieces.  I’m more familiar with Jupyter than Zeppelin, but regardless of the notebook you choose, this is a good exercise to become familiar with the process.

Predictive Maintenance

David Smith shows off a predictive maintenance gallery for dealing with aircraft engines:

In each case, a number of different models are trained in R (decision forests, boosted decision trees, multinomial models, neural networks and poisson regression) and compared for performance; the best model is automatically selected for predictions.

On a related note, Microsoft recently teamed up with aircraft engine manufacturer Rolls-Royceto help airlines get the most out of their engines. Rolls-Royce is turning to Microsoft’s Azure cloud-based services — Stream Analytics, Machine Learning and Power BI — to make recommendations to airline executives on the most efficient way to use their engines in flight and on the ground. This short video gives an overview.

Check out the data set and play around a bit.

Blocking Merge Statement

Kendra Little walks through the MERGE command and potential blocking issues with it:

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak  one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

Kendra’s final advice is to avoid the MERGE command in most cases; read on to find out why.

ALTER DATABASE

Dave Mason notes that ALTER DATABASE needs to be in an autocommitted transaction and will not roll back:

If you create that trigger and subsequently change the recovery model, you’ll get an error. But the recovery model doesn’t get changed back to its original setting. As the TSql comment notes, ROLLBACK doesn’t work in this context. And that’s a shame too. Sometimes the recovery model gets changed on accident, by a third-party vendor’s software installation, or by someone that isn’t familiar with the ramifications. I was trying to prevent that. <Grumble> At least I knew it wasn’t allowed. I just didn’t know why. After seeing this Connect item, “why” didn’t matter.

Dave goes on to point out that under certain circumstances, ALTER DATABASE will simply fail, so there are preventative checks; we just can’t build our own.

Getting Started With R Services

Kevin Feasel

2016-05-26

R

Jon Morisi has a compendium of resources in getting started with R:

The SQL Server 2016 release date is June 1, 2016.  Among the new features everyone is talking about is “R Services”.

Below is a quick reference of links related to “R Services”

Check it out.

Getting Current File Name In SSIS

Bill Fellows shows how to get the currently processing file name in SSIS:

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

He follows up with some Biml to drive home the point.

Pester For Configuration Checks

Rob Sewell talks about SQL Server configuration testing using Pester, a Powershell test framework:

Occasionally, for reasons I haven’t explored I had to test against the value property of the returned object

 It "The Full User Database Backup should be scheduled Weekly $OlaUserFullSchedule" {
$Return.OlaUserFullSchedule.value | Should Be $OlaUserFullSchedule
}

I wanted to be able to run the tests against environments or groups of servers with different default values so I parameterised the Test Results as well and then the logical step was to turn it into a function and then I could do some parameter splatting. This also gives me the opportunity to show all of the things that I am currently giving parameters to the test for

This is a nice walkthrough of Rob’s contribution, available on GitHub.

NUMA Node Count

Denny Cherry shows us how to find the number of NUMA nodes a SQL Server has:

The next way to see what your NUMA configuration is, is to look at the ERRORLOG file. Every time SQL Server is started it will log the NUMA configuration towards the beginning of the ERRORLOG file. That line will look something like this example.

Server Node configuration: node 0: CPU mask: 0x00000000000000ff:0 Active CPU mask: 0x00000000000000ff:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

Denny shows four different ways of getting this answer, and then provides some general advice on the maximum number of nodes you probably want.

Designing For Usability In Power BI

Avi Singh walks through a usability scenario in Power Pivot and Power BI:

Here is how we can go about combining the Products and the newly created SubCategory table for a more usable and elegant solution. I have highlighted the key steps, you can examine the solution file to go step by step in the Power BI Query.
Note: Don’t be scared of the M code, 99% of these steps were generated using the Query ribbon – I have included those screenshots as well.

When developing BI models, it’s important to keep things as simple as possible.  Your desired end user likely does not have the necessary skill level to wade through normalized table designs, so make it easy for them to get their jobs done.

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031