Analyzing Clickstream Data With Markov Chains

Eleni Markou shows one method of analyzing clickstream data:

We chose to use the third-order Markov Chain on the above-produced data, as:

  • The number of parameters needed for the chain’s representation remains manageable. As the order increases, the parameters necessary for the representation increase exponentially and thus managing them requires significant computational power.
  • As a rule of thumb, we would like at least half of the clickstreams to consist of as many clicks as the order of the Markov Chain that should be fitted. There is no point in selecting a third-order chain if the majority of the clickstream consists of two states and so there is no state three steps behind to take into consideration.

Fitting the Markov Chain model gives us transition probabilities matrices and the lambda parameters of the chain for each one of the three lags, along with the start and end probabilities.

This particular analysis is trying to understand which page (if any) a user will go to next when on a particular page.  Eleni uses additional techniques like k-means clustering to segment out particular groups of users.  Very interesting analysis.

Basics Of Survival Analysis

Subhasree Chatterjee explains the basics of survival analysis:

Survival analysis is a set of methods to analyze the ‘time to occurrence’ of an event. The response is often referred to as a failure time, survival time, or event time. These methods are widely used in clinical experiments to analyze the ‘time to death’, but nowadays these methods are being used to predict the ‘when’ and ‘why’ of customer churn or employee turnover as well.

The dependent variables for the analysis are generally two functions:

Read the whole thing.  H/T R-Bloggers

Log Shipping With dbatools

Sander Stad shows off a few log shipping functions he created for dbatools:

The entire log shipping functionality is now separated between 5 functions. Four of them are used internally and are not visible as a public function because you can easily break stuff if it’s not being used correctly.

The main function, Invoke-DbaLogShipping, is available in the dbatools module for anyone to use.

If you open the GUI for the log shipping you have lots of choices but most of them are already supplied in the GUI itself and you can decide whether to use them or not.
The whole idea behind the functionality was that it would allow you to quickly setup log shipping using a lot of defaults like you can in the GUI, but if you’re more experienced you can change any setting to your preferences.

Read on for an example of how to use this.  It looks pretty snazzy.

Loading From Excel Into SQL Server With Powershell

Shane O’Neill tackles one of the all-time important questions, how to get data from Excel into something else:

Now let’s say that we are working in an environment that does not allow us to change the server configurations, meaning that OPENROWSET() is closed for us.

Being completely honest, my spreadsheet is only 8 rows so I would just manually insert the data but what happens if it’s a few thousand rows big? Not a viable option.

So with most of my T-SQL ideas have been exhausted, let’s look at PowerShell!

The other all-time important question, of course, is how to get data from something else into Excel.

Automatically Restoring Databases To Another Server

Bjorn Peters needs to migrate a few databases over to a test server nightly:

My first thoughts about that were creating a SQL Server Agent Job with following steps:

  1. check the availability of Shared-Destination-Folder
  2. delete/clear Destination-Folder-Content
  3. Shrink all Transaction-Logfiles
  4. Backup all Databases from given list
  5. Restore each Backup-File from folder
  6. Check all orphaned user
  7. delete/clear Destination-Folder-Content

A year or two ago, I had built this with a lot of normal T-SQL-Agent-Jobs, now I’m doing this with dbatools which make it very easy and fast (and in one step)

It’s only a few lines of Powershell code, which you can see upon clicking through.

Homoglyphs And SQL Injection

Bert Wagner gives us one more reason why blacklists are bad:

Homoglpyhs can exist within a character set (like the Latin character set examples above) or they can exist between character sets. For example, you may have the unicode apostrophe ʼ, which is a homoglyph to the Latin single quote character '.

How does SQL Server handle unicode homoglyphs?

Funny you should ask. If you pass in a unicode character to a non-unicode datatype (like char), SQL implicitly converts the unicode character to its closest resembling non-unicode homoglyph.

Bert’s examples show failure when converting Unicode data to VARCHAR, so this attack vector may not work if the input parameter is NVARCHAR, but even if that’s the case, it’s still one of many reasons why blacklists are awful for preventing against injection attacks.

Comparing Server Configurations With Powershell

Andy Levy shows how to use a dbatools cmdlet, Get-DbaSpConfigure:

I started with Get-DbaSpConfigure to retrieve the settings available from sp_configureas these were the most important to my comparison. I ran this against production as well as each of my test instances and saved the results of each to a variable. Because accessing my production instance requires either jumping through hoops or using SQL Authentication, I passed -SqlCredential (get-credential -Message "Prod" -UserName MySQLLogin) so I’d be prompted for that password instead of using Windows Authentication.

It’s good to have an automated process in place to script comparisons, either against a known good build or across servers which ought to be the same.  Things that ought to be the same often aren’t.

Using Query Governor

Monica Rathbun explains what the query governor is in SQL Server:

It’s simple. This option, available in SQL Server 2008 standard and forward, will prevent long running queries based on run time measured in seconds. If I specify a value of 180 the query governor will not allow any execution of a query that it estimates will exceed that value. Notice it says ESTIMATES which means it will be based on optimizer estimates and not ACTUAL run times. It does NOT KILL an actively running query after designated amount of time.  There is no worries for rollback scenarios or partial data.

This is totally different from Resource Governor, which can throttle resources like CPU and I/O.  I’ve never turned on query governor and I’m not sure I ever would.

dbatools Now With 287 Commands

Chrissy LeMaire reports that dbatools has 287 cmdlets:

Really, I said all that so that you’d hopefully excuse the recent lack of communication 😉 Not only did I have the big ol’ audit, but we are also preparing for two dbatools pre-cons. One with Klaas Vandenberghe and Rob Sewell in Ghent, Belgium at dataminds and then the one in Seattle at PASS Summit.

This means that communication will slow a bit until December, but I wanted to give you an update. First, we’re now at a whopping 287 magical commands! We’ve got three kind folks working on the webpages and once they are all ready, I’ll do a new release. Until then, here’s over 80 newish commands that you can explore using Get-Help -Detailed.

dbatools has been a tremendous community success.


September 2017
« Aug Oct »