Press "Enter" to skip to content

Author: Kevin Feasel

Azure Synapse Analytics May 2022 Updates

Ryan Majidimehr lays out some updates for Azure Synapse Analytics:

Serverless SQL pools let you query files in the data lake without knowing the schema upfront. The best practice was to specify the lengths of character columns to get optimal performance. Not anymore!  

Previously, you had to explicitly define the schema to get optimal query performance. In this case, the column countries_and_territories is defined as varchar(50):  

There are some interesting updates in this month’s release, including the public preview of Azure Synapse Link for SQL, which connects to Azure SQL DB and SQL Server 2022.

Comments closed

Consuming an Azure ML AutoML Model in Excel

Lewis Prince needs to do some heavy lifting in Excel:

It has come back to my turn to write a blog post, and if you remember my previous one concerned why you should use Azure based AutoMl and subsequently how to do so. If you followed that then you will be left with a model of which you’ve scored and know the performance of, but no way of how to then deploy and use your model. I will outline the steps needed to do this (which involves a major shortcut as we are using an AutoMl model), and then show you the required VBA needed to consume this in Microsoft Excel.

Read on to see how you can do this. Back in the really old Azure ML days, you could download an Excel workbook which would have things set up and you could feed in a bunch of input data and get predictions.

Comments closed

Using Power BI Field Parameters with Data from Kusto

Dany Hoter combines Azure Data Explorer and a new feature in Power BI:

Field parameters are a new feature in Power BI as of the May version.

With field parameters you can give the consumer of a report a lot of flexibility about the content of the report, what fields are used in the visuals, what time granularity is used and what measures are displayed.

All this without writing any DAX or M code.

Click through for an example of how this works.

Comments closed

Perspective on Spinlocks

Erik Darling speaks with wisdom:

The more people want to avoid fixing what’s really wrong with their server, the more they go out and find all the weird stuff that they can blame on something else (usually the product), or keep doing the same things that aren’t fixing the problem.

Spinlocks are one of those things. People will measure them, stare them, Google them, and have no idea what to make of them, all while being sure there’s something going on with them.

I don’t want to discount when spinlocks can actually cause a problem, but you shouldn’t treat every performance problem like it’s a bridge too far from what you can solve.

I have seen performance problems which actually did come down to spinlock issues. For every one of those, I’ve seen, oh, about 95-100 or so which came down to inefficient code.

Comments closed

How It Works: Power BI Field Parameters Edition

Gilbert Quevauvilliers figures out how field parameters work:

In this blog post I want to give a visual representation as to how field parameters works and what the current limitations are.

It is important to be aware of the limitations so that you do not get caught out later or you are trying to figure out why it is not working.

I do hope my descriptions and pictures below help you understand how it works and when it does not work!

Click through for some detailed graphics and explanation.

Comments closed

Monitoring Streaming Queries in PySpark

Hyukjin Kwon, et al, lay out some monitoring advice:

Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and real-time data processing capabilities for analytics and triggering actions. However, monitoring streaming data workloads is challenging because the data is continuously processed as it arrives. Because of this always-on nature of stream processing, it is harder to troubleshoot problems during development and production without real-time metrics, alerting and dashboarding.

Read on to see how you can use the Observable API for alerting in PySpark—previously, it had been a Scala-only API.

Comments closed

Distributed Transactions in T-SQL

Kevin Wilkie explains what distributed transactions are and why you probably don’t want to use them:

In the version of transactions that we going to discuss today, we’re going to discuss doing transactions on multiple servers!

A Distributed transaction is defined by HazelSet to be “a set of operations on data that is performed across two or more data repositories”. In even simpler terms, it’s a command run against data on more than one server.

Click through for the warnings about what might possibly go wrong.

Comments closed

Projecting (Selecting) Results with KQL

Robert Cain continues a series on the Kusto Query Language:

So far in my Fun With KQL series, we have used the column tool, found on the right side of the output pane and described in my original post Fun With KQL – The Kusto Query Language, to arrange and reduce the number of columns in the output.

We can actually limit the number of columns, as well as set their order, right within our KQL query. To accomplish this we use the project operator.

Read on for several good uses of the project operator.

Comments closed