Press "Enter" to skip to content

Month: January 2021

Generating Alerts from Power Automate

Ed Hansberry shows how to create a Power Automate alert off of SQL Server data:

I’m going to show you how to do this in Power Automate with just a few steps. Let’s get started. In my example, I am going to return a table when a customer has placed an order where the order quantity will not divide evenly into the case pack. So if they order 100 units and the cases contain 24 each, I want to alert the order entry person to tell them the customer has effectively ordered 4.1667 cases, which isn’t allowed. They will need to order either 96 units or 120 units to get 4 or 5 cases.

Read on to see how.

Comments closed

Deduplicating Missing Index Requests

Erik Darling trains us in a skill:

Going back to our queries and our index requests, all the queries have two things in common:

– They filter on OwnerUserId
– They order by Score

There are of course other elements in the where clause to attend to, but our job is to come up with one index that helps all of our queries.

This gets a lot easier if you know the queries. If all you have is the set of index details, you can make a sane guess but the chance of being wrong is much greater.

Comments closed

Collecting Wait Stats Over Time

Michael J. Swart has a repo for us:

I find wait stats so useful that I’ve got Paul Randal’s SQL Server Wait Statistics (or please tell me where it hurts…) bookmarked and I visit it frequently.

But that gives the total amount of waits for each wait type accumulated since the server was started. And that isn’t ideal when I’m troubleshooting trouble that started recently. No worries, Paul also has another fantastic post Capturing wait statistics for a period of time.

You can also get this from various monitoring tools, as Michael mentions, but if you don’t have such a tool in place, here’s how you can roll your own.

Comments closed

ML Services: PYTHONHOME and PATH

Niels Berglund troubleshoots some issues:

In the last post, which looks at using Python 3.9 in SQL Server Machine Learning Services, I wrote this at the very end:

It looks like all is good, but maybe not? In a future post we’ll look at an issue we have introduced – but for now, let us bask in the glory of having created a new Python language extension.

In the post, we wrote a new language extension to handle Python 3.9, and that just worked fine. However, when I was doing some other things, I noticed some side effects, and in this post, we look at those side effects and how to solve them.

Click through to learn more.

Comments closed

The Nature of Overfitting

John Mount has a nice essay on overfitting:

What is meant by “overfitting” is: the estimated f() will tend to show off or over perform on the data used to fit, train, or construct it. I have some notes on this sort of selection bias here: https://win-vector.com/2020/12/10/overfit-and-reversion-to-mediocrity-the-bane-of-data-science/.

Selecting a model that “looks good” is enough to bias the model’s evaluation with respect to the data set we said it “looked good” on. So even when using unbiased methods, the data scientist can introduce bias by choosing to use one model (say the one fit by logistic regression) over another (say using using an observed prevalence everywhere as a probability prediction).

The way I talk about overfitting is to say that we’ve trained a model which latches onto the particulars of the training data set. To the extent that the particulars of the training data set are matched by the broader world, that’s “fitting.” To the extent that the particulars of the training data set are unique to that data set and are not generally applicable, that’s “overfitting.” Generally, I don’t have any more time to get into what this means, but John dives into the topic in an accessible way.

Comments closed

Extracting Refresh Metrics for a Power BI Workspace

Marc Lelijveld wants to pull some metrics:

In the Power BI service, you can easily look at refresh times for an individual dataset or dataflow. There are many different reasons why these metrics are important to you as a dataset or dataflow owner. For example, you may bump into refresh time-outs or unfortunate errors. There are many good reasons to think about why you want to have more insights in your refresh metrics.

Having that said, it can be a pain to look at these metrics every day. Power BI already offers a way to send automatic notification in case of a refresh failure. Though, I would personally prefer to have more insights in all my refresh metrics, whether they are failing or succeeding.

In this blog I want to share a way how you can export all refresh metrics for your datasets and dataflows using a PowerShell script.

Click through to see how and to get a copy of the script.

Comments closed

Running an mlflow Server on Azure

Paul Hernandez configures mlflow on Azure using platform-as-a-service offerings:

It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.

To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.

I concur on the power of mlflow.

Comments closed

String Modification in T-SQL

Steve Jones answers a question:

Recently I ran across a question posted by a beginner on the Internet and thought this would be a good, basic topic to cover. The question was: how can I replace a value in a comma separated string in a table?

This post covers the basics of this task.

Incidentally, this is where I say hey, that sounds like a failure in normalization. If you need to care about individual values in a collection, your value is not atomic. But that’s a bit of a tangent.

Comments closed

Transforming Arrays in Azure Data Factory

Mark Kromer shows off a few functions in Azure Data Factory to modify data in arrays:

The first transformation function is map() and allows you to apply data flow scalar functions as the 2nd parameter to the map() function. In my case, I use upper() to uppercase every element in my string array: map(columnNames(),upper(#item))

Read on for more iteration and aggregation functions akin to map, reduce, and filter.

Comments closed