Press "Enter" to skip to content

Curated SQL Posts

Using Python In SQL Server 2017

Emma Stewart has a post covering setup and configuration of SQL Server 2017 Machine Learning Services and using Python within SQL Server:

One of the new features of SQL Server 2017 was the ability to execute Python Scripts within SQL Server. For anyone who hasn’t heard of Python, it is the language of choice for data analysis. It has a lot of libraries for data analysis and predictive modelling, offers power and flexibility for various machine learning tasks and is also a much simpler language to learn than others.

The release of SQL Server 2016, saw the integration of the database engine with R Services, a data science language. By extending this support to Python, Microsoft have renamed R Services to ‘Machine Learning Services’ to include both R and Python.

The benefits of being able to run Python from SQL Server are that you can keep analytics close to the data (if your data is held within a SQL Server database) and reduce any unnecessary data movement. In a production environment you can simply execute your Python solution via a T-SQL Stored Procedure and you can also deploy the solution using the familiar development tool, Visual Studio.

ML Services is a great addition to SQL Server.

Comments closed

Query Store UserVoice Requests

Erin Stellato has a compendium of Query Store UserVoice requests:

In early January Microsoft announced that Connect, the method for filing SQL Server bugs and feature requests, was being retired.  It was replaced by User Voice, and any bugs/requests were ported over.  Sadly, the votes from Connect did not come across to User Voice, so I went through and found all the Query Store requests, which are listed below.  If you could please take the time to up-vote them, that would be fantastic.  If you could also take time to write about why this would help your business, help you upgrade, or purchase more SQL Server licenses, that is even better.  It helps the product team immensely to understand how this feature/fix/functionality helps you and your company, so taking 5 minutes to write about that is important.

Check them out and upvote any which look interesting.

Comments closed

Accessibility And Power BI Reports

Meagan Longoria has some tips to make your Power BI reports easier for people to read:

Avoid using color as the only means of conveying information. Add text cues where possible. It’s very common to show KPIs with a background color or a box next to a metric that uses red/yellow/green to indicate status. Users who have difficulties seeing color need another way to understand the status of a key metric. This could mean that you use a text icon in addition to or instead of color to indicate a status. Power BI reports often include conditional formatting to change the background color or font color of items in a table to convey high/low or acceptable/unacceptable values. If that is important for your users to understand, you could add a field containing the values “high” and “low” to the table itself or to the tooltips. Tooltips are accessible to screen readers via the accessible Show Data table (Alt + Shift + F11).

These are good design principles in addition to providing accessibility benefits.

Comments closed

Trial And Error With Read-Only Replica Queries

Cody Konior stress tests Availability Group round-robin routing:

I’ve been hearing about round-robin read-only routing ever since SQL 2016 came out but whenever I tried to test if it’s working it never seemed to be. But now I know exactly how it works and there’s a few loopholes where it may not trigger, and they’re not the documented ones you’re thinking of.

To test the limits of it you’re going to need:

  • PowerShell 5.1
  • Pester 4 (Install-Module Pester -Force)
  • DbData (Install-Module DbData -Force)

I’ll explain any of the Pester and DbData bits along the way so don’t worry. They’re minor framework stuff.

There’s some good stuff here around connection pooling, so check it out.

Comments closed

Managing Multiple Power BI Accounts With Chrome

Ike Ellis has a quick tip for managing multiple Power BI accounts across different clients:

 As a consultant, I find it difficult to switch between accounts on PowerBI.com.

I have to log out of an existing account and log back in to a new account. The login process takes a long time. I have found a work around. I use google chrome to manage different chrome accounts, different themes, different cookies, and this allows me to stay logged in to multiple power bi accounts at the same time.

Great tip.

Comments closed

The Date Data Type

Randolph West continues his dates and times series:

QL Server 2008 introduced new data types to handle dates and times in a more intelligent way than the previous DATETIME and SMALLDATETIME types that we looked at previously.

The first one we look at this week is DATE. Whereas DATETIME uses eight bytes and SMALLDATETIME uses four bytes  to store their values, DATE only needs a slender three bytes to store any date value between 0001-01-01 and 9999-12-31inclusive.

The DATE data type was a fantastic addition to SQL Server 2008.

Comments closed

Rolling Calculations In R

Steph Locke explains some tricky behavior with window functions in R:

So looking at the code I wrote, you may have expectedc2 to hold NA, 3, 5, ... where it’s taking the current value and the prior value to make a window of width 2. Another reasonable alternative is that you may have expected c2 to hold NA, NA, 3, ... where it’s summing up the prior two values. But hey, it’s kinda working like cumsum() right so that’s ok! But wait, check out c3. I gave c3 a window of width 3 and it gave me NA, 6, 9, ... which looks like it’s summing the prior value, the current value, and the next value. …. That’s weird right?

It turns out the default behaviour for these rolling calculations is to center align the window, which means the window sits over the current value and tries it’s best to fit over the prior and next values equally. In the case of us giving it an even number it decided to put the window over the next values more than the prior values.

Knowing your window is critical when using a window function, and knowing that some functions have different default windows than others helps you be prepared.

Comments closed

Disagreement On Outliers

Antony Unwin reviews how various packages track outliers using the Overview of Outliers plot in R:

The starting point was a recent proposal of Wilkinson’s, his HDoutliers algorithm. The plot above shows the default O3 plot for this method applied to the stackloss dataset. (Detailed explanations of O3 plots are in the OutliersO3 vignettes.) The stackloss dataset is a small example (21 cases and 4 variables) and there is an illuminating and entertaining article (Dodge, 1996) that tells you a lot about it.

Wilkinson’s algorithm finds 6 outliers for the whole dataset (the bottom row of the plot). Overall, for various combinations of variables, 14 of the cases are found to be potential outliers (out of 21!). There are no rows for 11 of the possible 15 combinations of variables because no outliers are found with them. If using a tolerance level of 0.05 seems a little bit lax, using 0.01 finds no outliers at all for any variable combination.

Interesting reading.

Comments closed

Data Modeling And Neural Networks

I have two new posts in my launching a data science project series.  The first one covers data modeling theory:

Wait, isn’t self-supervised learning just a subset of supervised learning?  Sure, but it’s pretty useful to look at on its own.  Here, we use heuristics to guesstimate labels and train the model based on those guesstimates.  For example, let’s say that we want to train a neural network or Markov chain generator to read the works of Shakespeare and generate beautiful prose for us.  The way the recursive model would work is to take what words have already been written and then predict the most likely next word or punctuation character.

We don’t have “labeled” data within the works of Shakespeare, though; instead, our training data’s “label” is the next word in the play or sonnet.  So we train our model based on the chains of words, treating the problem as interdependent rather than a bunch of independent words just hanging around.

Then, we implement a data model using a neural network:

At this point, I want to build the Keras model. I’m creating a build_model function in case I want to run this over and over. In a real-life scenario, I would perform various optimizations, do cross-validation, etc. In this scenario, however, I am just going to run one time against the full training data set, and then evaluate it against the test data set.

Inside the function, we start by declaring a Keras model. Then, I add three layers to the model. The first layer is a dense (fully-connected) layer which accepts the training data as inputs and uses the Rectified Linear Unit (ReLU) activation mechanism. This is a decent first guess for activation mechanisms. We then have a dropout layer, which reduces the risk of overfitting on the training data. Finally, I have a dense layer for my output, which will give me the salary.

I compile the model using the RMSProp optimizer. This is a good default optimizer for neural networks, although you might try AdagradAdam, or AdaMax as well. Our loss function is Mean Squared Error, which is good for dealing with finding the error in a regression. Finally, I’m interested in the Mean Absolute Error–that is, the dollar amount difference between our function’s prediction and the actual salary. The closer to $0 this is, the better.

Click through for those two posts, including seeing how close I get to a reasonable model with my neural network.

Comments closed

Using STRING_AGG In SQL Server 2017

Derik Hammer talks about one of the nicer T-SQL additions in SQL Server 2017:

Creating comma separated strings from a column, or delimited strings as I like to call it, is a very common problem in SQL. Beginning with SQL Server 2017 and Azure SQL Database, there is now another option to the existing set of solutions, STRING_AGG().

I would like to convince you to use STRING_AGG over the other methods. So, let us begin with the competing solutions.

I completely agree and have been switching code over to use STRING_AGG since upgrading to 2017.  The code is so much clearer as a result compared to STUFF + FOR XML PATH concatenation.

Comments closed