Press "Enter" to skip to content

Month: November 2021

MMLSpark Is Now SynapseML

Mark Hamilton has an announcement:

Today, we’re excited to announce the release of SynapseML (previously MMLSpark), an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. Building production-ready distributed ML pipelines can be difficult, even for the most seasoned developer. Composing tools from different ecosystems often requires considerable “glue” code, and many frameworks aren’t designed with thousand-machine elastic clusters in mind. SynapseML resolves this challenge by unifying several existing ML frameworks and new Microsoft algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Read on to learn more about the library.

Comments closed

Documenting Power BI Dataset Measures

Gilbert Quevauvilliers thinks about documentation:

One thing that often happens is when users are using a dataset, they want to know which measures are available. And not only that sometimes they want to know the measure definition.

This got me thinking and how best could I give this to the users in my organization to be able to find this information quickly and easily.

In the past this was a manual effort not only to export the measures, but also to maintain a document, so that as measures are added, updated, or deleted I would then need to manually update some document.

Yep, you guessed it I created a Power BI report which has got all the measures and their measure definitions, which will update with the dataset! And I show you how I did this below.

Click through to see how.sfff

Comments closed

Clear out Those Old Container Images

Joy George Kunjikkur has a public service announcement for us:

When we use self-hosted Azure pipeline agents, we may encounter the below issue during the build process. This is not a hard issue to troubleshoot. The reason is there in the error message.

Error processing tar file(exit status 1): open /root/.local/share/NuGet/v3-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_system.reflection.metadata.1.4.2.dat: no space left on device

This is known in the industry as a whoopsie-doops. Click through to see what you can do to resolve the problem.

Comments closed

Saying No to NOLOCK

Brent Ozar just says no:

When you put NOLOCK in your query, SQL Server will:

– Read rows twice

– Skip rows altogether

– Show you changes that never actually got committed

– Let your query fail with an error

This is not a bug. This is by design. 

There are reasons why you might want to use NOLOCK, but start with no and you’ll be in better shape.

Also, remember that NOLOCK really means “No, lock!”

Comments closed

Managing File Retention in Blob Storage

Jeet Kainth shows how to configure a retention strategy in Azure Blob Storage:

This blog is a follow up to a previous blog I wrote about backing up Azure Analysis Services cubes in Azure, that blog can be found here. This blog shows how to implement a retention policy using PowerShell in Azure Runbooks to remove the backups after a set number of days. To create a new Runbook in the Azure portal, go to the relevant Automation account in the relevant resource group and then select Runbooks from the left hand pane. Note you will need to add the Az.Storage module to the automation account to be able to use some of the commands listed in this blog.

Click through for the process, including Powershell code to perform the task.

Comments closed

Offset Window Functions

I continue a series on window functions:

Offset functions are another class of window function in SQL Server thanks to being part of the ANSI SQL standard. Think of a window of data, stretching over some number of rows. What we want to do is take the current row and then look back (or “lag” the data) or forward (“lead”).

There are four interesting offset window functions: LAG()LEAD()FIRST_VALUE(), and LAST_VALUE(). All four of these offset functions require an ORDER BY clause because of the nature of these functions. LAG() and LEAD() take two parameters, one of which is required. We need to know the value to lag/lead, so that’s a mandatory parameter. In addition, we have an optional parameter which indicates how many steps forward or backward we want to look. Let’s see this in action with a query:

Click through for that query, as well as a few more and plenty of explanation.

Comments closed

Error Messages during Change Tracking Cleanup

Lee Markum troubleshoots some error messages:

You’re a data professional and you’re trying to keep up with patching a wide range of SQL Server versions and editions. How do you know what’s in the CU and whether you should apply it or not? My favorite way to read up on CUs is to go to SQLServerUpdates and click around there.  It will take you to the latest CU pages as well as give you a way to see previous CUs that are available.

While doing this recently, I discovered this for CU 26 on SQL Server 2017.

These sorts of regressions do slip in, so keep an eye on them before (and after) upgrading. Lee gives us a concreate example of one in a recent CU of SQL Server 2017.

Comments closed

Sargability and Dates

Chad Callihan makes me want to change the title to “Getting Sarge a Date”:

We’ll use the StackOverflow2013 database for this example. Let’s say we want to return the users created in 2013. One way to return this data is to use the YEAR() function to pull out the desired year for our query:

For the reference, check out Chad’s prior post. My expectation is that about 90% of people in the US who are aware of the term pronounce it “Sarge-able” instead of “Sar-guhble” and therefore immediately think of Sergeants.

1 Comment