# Day: November 26, 2019

While it is true there are techniques in machine learning that required advanced maths knowledge, some of the most widely used approaches make use of knowledge given to every child at secondary school. The line of best fit, drawn by many a student in Year 8 Chemistry, can also be known by its alter-ego, linear regression, and see applications all over machine learning. Neural networks, central to some of the most cutting-edge applications, are formed of simple mathematical models consisting of some addition and multiplication.

A personal favourite technique, and the subject of this blog, is the humble decision tree, taught in schools all over the country. This blog will take a high-level look at the theory around decision trees, an extension using random forests, and the real-world applications of these techniques.

Not long ago, I got a chance to learn a Bot 101 lesson from my teammate Wayne Smith. It was a great class because it helped me who is an new learner to understand a lot of key concepts of Microsoft bot. Because it is in an internal meeting and there is no public video released, I wrote some notes below to share with you.

Click through for Annie’s notes and a bunch of links to additional resources.

I’ll often see people need to “flatten” dates to certain intervals.

By flatten, I mean the start of a day, week, month, or year, and likewise for the end values to all those intervals.

I’ve seen some really bad implementations of this. Most notable was probably a scalar valued function that converted a datetime to a 10 character string to remove the time portion.

In every where clause.

Click through for additional introductory notes and some links to good resources.

We are going to follow the steps to manually create a dimension Date table with basic DAX. First thing, we need to create a new table.

Click through for the demo. There’s a lot more you can include in a date dimension, but this gives you the starting point.

What I really love about PowerShell is how simple it is to filter, extend and manipulate tabular data using the pipeline, in a way that resonates a lot with the experience of T-SQL developers.

The main part of the script is the one that invokes all the diagnostic queries included in the list `\$queries`. This is done by invoking the cmdlet `Invoke-DbaDiagnosticQuery`, that takes care of using a version of the diagnostic query that matches the version of the target server and selecting the data. As usual with dbatools, the `-SqlInstance` parameter accepts a list of servers, so you can pass in the list of all the SQL Servers in your infrastructure.

Gianluca has a GitHub repo for the script and explains it in more detail as well.

For `VARCHAR`, some of you might be thinking that it was “1” until recently when SQL Server 2019 introduced the “`_UTF8`” collations. Nope. The last time “1” was correct for `VARCHAR` was back in SQL Server 7.0, before SQL Server 2000 introduced the Windows collations which offered some Double-Byte Character Sets.

For `NVARCHAR`, some of you might be thinking that it was “2” until SQL Server 2012 introduced the “`_SC`” collations that fully support Supplementary Characters (UTF-16). Sorry, still incorrect. “2” was never technically correct for `NVARCHAR`, it was only temporarily correct for the first few years (until Supplementary Characters were defined in Unicode 3.1, released in March, 2001). Ever since SQL Server 7.0 introduced the `NCHAR``NVARCHAR`, and `NTEXT` datatypes, it has been possible to store whatever UTF-16 byte sequences you want, even if they are currently undefined. The older collations do not recognize surrogate pairs / Supplementary Characters, but that’s not related to SQL Server’s ability to store and retrieve any 16-bit code point. As long as you are using a font that supports Supplementary Characters, they should display correctly.

Solomon is one of a handful of people I’ve met who has collations and characters down cold.