Press "Enter" to skip to content

Day: April 22, 2022

Custom Model Evaluation Metrics with MLflow

Mark Zhang shows off a new bit of functionality in MLflow:

According to an internal customer survey, 75% of respondents say they frequently or always use specialized, business-focused metrics in addition to basic ones like accuracy and loss. Data scientists often utilize these custom metrics as they are more descriptive of business objectives (e.g. conversion rate), and contain additional heuristics not captured by the model prediction itself.

In this blog, we introduce an easy and convenient way of evaluating MLflow models on user-defined custom metrics. With this functionality, a data scientist can easily incorporate this logic at the model evaluation stage and quickly determine the best-performing model without further downstream analysis

Click through to see how to use built-in metrics but also how to create your own.

Comments closed

String Concatenation in R

Benjamin Smith creates a function:

While it is possible to use the paste() or paste0() for string concatenation. I do understand how it can be messy to deal with, especially when working with loops and/or nested functions. In this short blog I share a remedy for this by writing a special function which can lend for cleaner code as opposed to using paste() or paste0().

It’s not quite as nice as a here string (e.g., @"{FirstName} just referenced the name here string at {UserTime}" user.FirstName DateTime.UtcNow) but this is a good reminder that operator creation in R is pretty easy. H/T R-Bloggers.

Comments closed

Azure Data Studio April 2022 Updates

Timi Oshin has some release notes for us:

We are excited to announce the general availability of the Azure SQL Migration extension for Azure Data Studio. Among many other capabilities, this extension can be used for migrating SQL Server databases to Azure for an enhanced user experience. With this extension, users can get right-sized Azure recommendations based on performance data collected from your source SQL Server databases to optimize for cost and scale. The migration experience is powered by the Azure Database Migration Service which provides a scalable, resilient, and secure way to meet the needs of your organization. See below for a snapshot UI of this extension.

Click through for more notes on Azure SQL migration, the table designer, and more.

Comments closed

Subscribing to Power BI Reports

Reza Rad looks at e-mail subscriptions of Power BI reports:

Have you ever wondered is it possible to have updates of the Power BI report to be emailed to you (or some other colleagues) on a daily basis? Power BI, fortunately, has this feature, it is called Subscription. Subscriptions are helpful ways to send an up-to-date version of the report and dashboard to the users’ email addresses on a scheduled basis. In this article and video, I’ll explain what a subscription is and how it works in Power BI.

Click through for the video and complete blog post.

Comments closed

Splitting Strings with Quoted Names

Daniel Hutmacher mixes separators with regular characters:

Suppose you have a delimited string input that you want to split into its parts. That’s what STRING_SPLIT() does:

DECLARE @source nvarchar(max)='Canada, Cape Verde, '+    'Central African Republic, Chad, Chile, China, Colombia, Comoros';

SELECT TRIM([value]) AS[Country]
FROM STRING_SPLIT(@source, ',');

Simple enough. But delimited lists are tricky, because the delimiter could exist in the name itself. Look for yourself what happens when we add the two Congos to the list:

Daniel has a clever solution to the problem.

Comments closed

Optimizing Index Spools

Francisco looks at index spools:

When we are analyzing execution plans, we may come across different types of Spool operators – Table Spools, Row Count Spools, Window Spools or Index Spools – that the Query Optimizer chooses for specific purposes. In this post we are going to briefly look into the Index Spool, how it can sometimes lead to suboptimal query performance, and what can be done to easily fix it.

My favorite description of this is Erik Darling’s: spools are SQL Server’s passive-aggressive way of telling you “I’m not saying you need an index but you need an index.”

Comments closed

Logic Apps: Source Control and Deployment

Koen Verbeeck has a two-parter. First up is storing Logic App code in source control:

At a data warehouse project I’m using a couple of Logic Apps to do some lightweight data movements. For example: reading a SharePoint list and dumping the contents into a SQL Server table. Or reading CSV files from a OneDrive directory and putting them in Blob storage. Some of those things can be done in Azure Data Factory as well, but it’s easier and cheaper to do them with Logic apps.

Logic Apps are essentially JSON code behind the scenes, so they should be included into the source control system of your choice (for the remainder of the blog post we’re going to assume this is git).

The second post covers deployment:

It’s easy to duplicate an Azure Logic App in a resource group, but unfortunately you cannot duplicate a Logic App between environments (you might try to copy paste the JSON though). So unless you want to hand craft every Logic App yourself on each of your environments, you need a way to automatically deploy your Logic Apps. It’s easier, faster and less error-prone than any manual method.

Check out both posts.

Comments closed

Calculating Running Totals with Window Functions

Steve Jones shows off a good use case for window functions:

Recently I was looking at some data and wanted to analyze it by month. I have a goal that is set for each day and then an actual value. I wanted to know how I was tracking against the goal, as a running total. If my goal is 10 a day, then I ought to actually get to 10 the first day, 20 for the second day (10 + 10), etc.

Read on to see how Steve solved the problem.

Comments closed