Press "Enter" to skip to content

Author: Kevin Feasel

Optimizing M Function Calls With Function.ScaleVector()

Chris Webb shows us how we can batch calls to M-driven web services:

One of the most common issues faced when calling web services in M is that the the easiest way of doing so – creating a function that calls the web service, then calling the function once per row in a table using the Invoke Custom Function button – is very inefficient. It’s a much better idea to batch up calls to a web service, if the web service supports this, but doing this forces you to write more complex M code. It’s a problem I wrestled with last year in my custom connector for the Cognitive Services API, and in that case I opted to create functions that can be passed lists instead (see here for more information on how functions with parameters of type list work); I’m sure the developers working on the new AI features in dataflows had to deal with the same problem. This problem is not limited to web services either: calculations such as running totals also need to be optimised in the same way if they are to perform well in M. The good news is that there is a new M function Function.ScalarVector() that allows you to create functions that combine the ease-of-use of row-by-row requests with the efficiency of batch requests.

As Chris notes in the post and in the comments, this is mostly useful when you can batch together individual calls to improve performance.  For functions which operate serially (like opening Excel workbooks), you won’t see much (if any) gain.

Comments closed

Monitoring When Databases Go Offline

Jason Brimhall shows how you can create an extended event to track whenever databases go offline:

The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.

Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.

Click through for the script.

Comments closed

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS:

As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended up having 20+ single scripts which was not really maintainable any more. So I packed them into a PowerShell module and also published it to the PowerShell Gallery (https://www.powershellgallery.com/packages/DatabricksPS) for everyone to use!

This looks like a pretty good module if you work with Databricks.

Comments closed

Management Studio 18 Preview 5 Released

Dinakar Nethi announces a new public preview of SQL Server Management Studio 18:

We are very excited to announce the release of Public Preview 5 of SQL Server Management Studio (SSMS) 18.0. This release has a number of new features and capabilities and several bug fixes across SQL Server Management Objects (SMO), UI, etc.

You can download SSMS 18.0 Public Preview 5 here.

The most interesting thing in it for me is probably the menu item for CREATE OR ALTER with scripts.

Comments closed

Avoid Key Lookups On Clustered Columnstore Indexes

Joey D’Antoni points out a potential big performance problem with clustered columnstore indexes:

In the last year or so, with a large customer who makes fairly heavy use of this pattern, I’ve noticed another concern. Sometimes, and I can’t figure out what exactly triggers it, the execution plan generated, will do a seek against the nonclustered index and then do a key lookup against the columnstore as seen below. This is bad for two reasons–first the key lookup is super expensive, and generally columnstores are very large, secondly this key lookup is in row execution  mode rather than batch and drops the rest of the execution plan into row mode, thus slowing the query down even further.

Joey also has a UserVoice item as well, so check it out.

Comments closed

Managing Powershell Core On Non-Windows Machines

Max Trinidad shows us how to grab the latest version of Powershell Core if you aren’t using Windows:

So, if PowerShell Core isn’t available in the package repository, with a few steps you can download and install PowerShell. But, the first thing I do is to remove it before installing.

Ubuntu

## - When PowerShell Core isn't available in their repository: (download and execute install)
cd Downloads
wget https://github.com/PowerShell/PowerShell/releases/download/v6.1.1/powershell_6.1.1-1.ubuntu.18.04_amd64.deb
sudo dpkg -i powershell_6.1.1-1.ubuntu.18.04_amd64.deb

## - When available in Apt/Apt-Get repository:
sudo apt install -y powershell #-> Or, powershell-preview

Click through for demos of CentOS (or any other yum-based system) and MacOS X.

Comments closed

Power BI Request: Subtotal Details At The Bottom Of A Section

Imke Feldmann points out a problem with trying to use Power BI to generate financial reports:

Although this might not be what the inventors of Power BI had in mind, large lots of folks are trying to create classical financial statements in it. And putting aside the afford that might go into getting the numbers right, there is still a major drawback to swallow:

Click through for a depiction of the problem and then go vote for this on Power BI Ideas.

Comments closed

Using Query Store To Diagnose Implicit Conversion Issues

Tom Norman shares a case study of using Query Store to fix a nasty implicit conversion problem:

A while ago, we contracted with a third party to start using their software and database with our product.  We put the database in Azure but within a year, the database grew to over 250 gigs and we had to keep raising the Azure SQL Database to handle performance issues.  Due to the cost of Azure, a decision was made to bring the database back on-premise.  Before putting the database on the on-premise SQL Server, the server was running with eight CPUs.  In production, we are running SQL Server 2016 Enterprise Edition. When we put the vendor database into production, we had to dramatically increase our CPUs in production, ending up with twenty-eight CPUs. Even with twenty-eight CPUs, during most of the production day, CPUs were running persistently at seventy-five percent. But why?

Tom takes us from symptom (high CPU utilization) to diagnosis and is able to provide the third-party vendor enough information to improve their product.

Comments closed

Migrating A Database To Managed Instances

Frank Gill shows how to migrate a database from on-premises to an Azure SQL Managed Instance:

If you have run through my last Managed Instance blog post, you have a Managed Instance at your disposal.  The PowerShell script for creating the network requirements also contains steps to create an Azure VM in a different subnet in the same VNet.  Unless you have a site-to-site VPN or Express Route between your on-prem environment and Azure, you will use this VM to connect to your Managed Instance.

Install Management Studio on the Azure VM.  To connect to your Managed Instance, you will need the host name for your Managed Instance.  You can find the Managed Instance host name on the resource page for your Managed Instance in the Portal.

I think this migration story is a bit easier for DBAs than the old Azure SQL Database strategy of building dacpacs.

Comments closed

Kafka Connect Converters And Serialization

Robin Moffatt goes into great detail on Apache Kafka Connect converters and serialization techniques:

Kafka Connect is modular in nature, providing a very powerful way of handling integration requirements. Some key components include:

  • Connectors – the JAR files that define how to integrate with the data store itself
  • Converters – handling serialization and deserialization of data
  • Transforms – optional in-flight manipulation of messages

One of the more frequent sources of mistakes and misunderstanding around Kafka Connect involves the serialization of data, which Kafka Connect handles using converters. Let’s take a good look at how these work, and illustrate some of the common issues encountered.

Read on for a good overview of the topic.

Comments closed