Press "Enter" to skip to content

Day: November 26, 2019

The Joy of Decision Trees

Tom Jordan explains how a simple set of “if” statements forms the basis of some powerful data science algorithms:

While it is true there are techniques in machine learning that required advanced maths knowledge, some of the most widely used approaches make use of knowledge given to every child at secondary school. The line of best fit, drawn by many a student in Year 8 Chemistry, can also be known by its alter-ego, linear regression, and see applications all over machine learning. Neural networks, central to some of the most cutting-edge applications, are formed of simple mathematical models consisting of some addition and multiplication.

A personal favourite technique, and the subject of this blog, is the humble decision tree, taught in schools all over the country. This blog will take a high-level look at the theory around decision trees, an extension using random forests, and the real-world applications of these techniques.

Read on for more.

Comments closed

Bot Framework 101 Notes

Annie Xu has some notes from an introductory course on the Microsoft Bot framework:

Not long ago, I got a chance to learn a Bot 101 lesson from my teammate Wayne Smith. It was a great class because it helped me who is an new learner to understand a lot of key concepts of Microsoft bot. Because it is in an internal meeting and there is no public video released, I wrote some notes below to share with you.

Click through for Annie’s notes and a bunch of links to additional resources.

Comments closed

Fun with Date Math and Performance

Erik Darling has started a new series on how hard it can be to get a date:

I’ll often see people need to “flatten” dates to certain intervals.

By flatten, I mean the start of a day, week, month, or year, and likewise for the end values to all those intervals.

I’ve seen some really bad implementations of this. Most notable was probably a scalar valued function that converted a datetime to a 10 character string to remove the time portion.

In every where clause.

Click through for additional introductory notes and some links to good resources.

Comments closed

Multi-Server Diagnostics with dbatools

Gianluca Sartori shows how we can collect diagnostic information from multiple SQL Server instances by way of dbatools:

What I really love about PowerShell is how simple it is to filter, extend and manipulate tabular data using the pipeline, in a way that resonates a lot with the experience of T-SQL developers.

The main part of the script is the one that invokes all the diagnostic queries included in the list $queries. This is done by invoking the cmdlet Invoke-DbaDiagnosticQuery, that takes care of using a version of the diagnostic query that matches the version of the target server and selecting the data. As usual with dbatools, the -SqlInstance parameter accepts a list of servers, so you can pass in the list of all the SQL Servers in your infrastructure.

Gianluca has a GitHub repo for the script and explains it in more detail as well.

Comments closed

SQL Server and Bytes Per Character

Solomon Rutzky explains that just because you’ve got a VARCHAR column, it’s not necessarily one byte per character:

For VARCHAR, some of you might be thinking that it was “1” until recently when SQL Server 2019 introduced the “_UTF8” collations. Nope. The last time “1” was correct for VARCHAR was back in SQL Server 7.0, before SQL Server 2000 introduced the Windows collations which offered some Double-Byte Character Sets.

For NVARCHAR, some of you might be thinking that it was “2” until SQL Server 2012 introduced the “_SC” collations that fully support Supplementary Characters (UTF-16). Sorry, still incorrect. “2” was never technically correct for NVARCHAR, it was only temporarily correct for the first few years (until Supplementary Characters were defined in Unicode 3.1, released in March, 2001). Ever since SQL Server 7.0 introduced the NCHARNVARCHAR, and NTEXT datatypes, it has been possible to store whatever UTF-16 byte sequences you want, even if they are currently undefined. The older collations do not recognize surrogate pairs / Supplementary Characters, but that’s not related to SQL Server’s ability to store and retrieve any 16-bit code point. As long as you are using a font that supports Supplementary Characters, they should display correctly.

Solomon is one of a handful of people I’ve met who has collations and characters down cold.

Comments closed

High Availability Changes in SQL Server 2016

Martin Surasky looks at what SQL Server 2016 changed with respect to High Availability options:

AlwaysOn Availability Groups, first introduced in SQL Server 2012 is a feature that is conceptually similar to database mirroring. I’m going to assume you already know what AlwaysOn Availability Groups are in general, their main purpose and how they are different (essential aspects at least) from other technologies to provide replication such as Database Mirroring.

In SQL Server 2014, the significant enhancement to availability groups was the increase in the number of supported secondary replicas from three to eight. SQL Server 2016 includes a number of new enhancements

The biggest thing about it, as I recall, was stability: I wouldn’t have recommended too many places go into production with Availability Groups in 2012, but by 2016, many of the biggest bugs were ironed out.

Comments closed