Press "Enter" to skip to content

Curated SQL Posts

Debugging Azure Data Factory Data Flows

Mark Kromer takes us through debugging Azure Data Factory Data Flows:

When you are designing your mapping data flows in ADF, you are working against a live Azure Databricks Spark cluster. The size of that cluster is configurable via the Azure Integration Runtime. If you do not configure a custom Azure IR, then you will use the default Azure IR. That sets a very small cluster size by default of 4 cores for a single worker node and 4 cores for a single driver node. In most cases, while debugging and using data preview, that should be fine. But when you start exploring your data with column statistics or increase the sampling size in debug settings, you may find that you’ve exceeded the capacity on that small default cluster. Below are the steps you need to take to increase the size of your debug cluster.

Click through for step-by-step instructions.

Comments closed

SQL Server Assessment API Generally Available

Ebru Ersan announces the general availability of the SQL Assessment API in Powershell:

SQL Assessment API is a new mechanism to evaluate configuration of your SQL Server for best practices. The API methods are used by means of a SQL Server Management Object (SMO) extension and new cmdlets in SqlServer PowerShell module. API is delivered with a ruleset that is highly customizable and extensible. It can be used to assess SQL Server versions 2012 and higher, both on Windows and Linux, as well as Azure SQL DB Managed Instance. More products will be supported in future releases.
 
SQL Assessment API is shipped as part of SqlServer PowerShell module (21.1.18206) and SMO NuGet Package (150.18208.0).

It’s a set of reasonably good practices and also lets you customize for your own environment, so check it out.

Comments closed

Troubleshooting Tez Performance

Dmitry Tolpeko digs through Tez logs to figure out a performance issue:

Why did it take so long to run the job? Is there any way to improve its performance?

Tez Application Master Log
I am going to use the Tez AM log to investigate vertex performance and find possible bottlenecks.
Note that there is the Timeline Server REST API that you can use to get the statistics for Tez jobs, but the application master log is “event-driven”, shows the exact order of all events and contains much more details in general.

Click through for the process.

Comments closed

Preventing Overfitting in ML Models

Tom Jordan gives us four techniques to reduce the likelihood of overfitting in our models:

Dropout
This technique is exclusively used within the training of neural networks, so isn’t applicable to all machine learning models, however can be used in the production of extremely effective neural network models. During the start of each step in the training process, each sub unit of the model, the neuron, has a probability of being included in that step or not. If it doesn’t make the cut, it is effectively deleted from the network for that step, and then reintroduced on the next step.

There are some good techniques here.

Comments closed

Troubleshooting Deadlocks using Extended Events

Jamie Wick helps us figure out what’s causing deadlocks:

Recently I started getting random alerts that a job on one of the SQL servers was failing because of a deadlock problem.

The source of the problem wasn’t immediately discernible as there wasn’t any pattern to when the job was failing. Troubleshooting was further complicated by the database being written/maintained by a 3rd party vendor that encrypts all of their stored procedures.

So… How to find out what was causing the deadlock?

Extended Events are an ideal solution for this situation.

Read on to learn how.

Comments closed

Fun with Markdown in Azure Data Studio

Dave Bland takes us through some of the formatting options available in Azure Data Studio notebooks:

When working in a Notebook you have two types of cells, text and code.  The focus of this post is how to format the text cell.  Of course text goes into this cell so that part is easy and of course the text can say anything you would like to say.  When we work with text in Word, there is a format tool bar that we can use to make it look like we want it.  The text cells do not have this toolbar.

You might be asking, without the format toolbar, does that mean we can’t format the text?  That answer is no….we can still format the text, we just need to do it slightly different.  Rather than use a toolbar, we need to use characters.

There’s a lot of power in Markdown.

Comments closed

The Future of In-Memory OLTP?

Niko Neugebauer has some speculation regarding In-Memory OLTP:

In SQL Server 2019 without that much deserved fanfare (for me at least), Microsoft has released a couple of improvements that made me think about the current status (was apparently almost dead) and the future of the In-Memory OLTP technology, that was launched in the year 2014, and besides significant improvements in SQL Server 2016 looked pretty much abandoned since. 

This is a perfect example of an excellent idea halfway implemented. There’s a lot of potential here, but enough pieces are missing that it’s hard to recommend using it outside of specialized scenarios. And that’s a shame given the potential.

Comments closed

Creating a SQL Managed Instance

Jess Pomfret takes us through creation of a SQL Managed Instance:

I’ve been thinking about the cloud a lot lately, and I feel it’s an area that I would benefit from learning more about. I’ve attended a couple of presentations on SQL Managed Instances and have read enough to be dangerous (or accidentally spend a lot of money, one of my biggest fears when working in the cloud). However, I always find I learn best and really get to understanding a topic by building something.

This post will be the first in at least a two part series on SQL Managed Instances (MI). My goal in this post is just to deploy an MI and have it ready to use for my next post.

Read on for the step-by-step instructions.

Comments closed