Press "Enter" to skip to content

Author: Kevin Feasel

ASYNC_NETWORK_IO and Execution Plans

Jonathan Kehayias dives into an interesting problem:

A few weeks ago, an interesting question was asked on the #SQLHelp hash tag on Twitter about the impact of execution plans on the ASYNC_NETWORK_IO wait type, and it generated some differing opinions and a lot of good discussion.

My immediate answer to this would be that someone is misinterpreting the cause and effect of this, since the ASYNC_NETWORK_IO wait type is encountered when the Engine has results to send over TDS to the client but there are no available TDS buffers on the connection to send them on. Generally speaking, this means that the client side is not consuming the results efficiently, but based on the ensuing discussion I became intrigued enough to do some testing of whether or not an execution plan would actually impact the ASYNC_NETWORK_IO waits significantly.

To summarize: Focusing on ASYNC_NETWORK_IO waits alone as a tuning metric is a mistake. The faster a query executes, the higher this wait type will likely accumulate, even if the client is consuming results as fast as possible. (Also see Greg’s recent post about focusing on waits alone in general.)

Click through for the things Jonathan tested.

Comments closed

Building Metadata for an ADF Pipeline

Paul Andrew continues a series on Azure Data Factory and metadata-driven pipelines:

Welcome back friends to part 2 of this 4 part blog series. In this post we are going to deliver on some of the design points we covered in part 1 by building the database to house our processing framework metadata.

Let’s start with a nice new shiny Azure SQLDB database and schema. This can easily be scaled up as our calls from Data Factory increase and ultimately the solution we are using the framework for grows.

Soon we will get to see the Azure Data Factory power in action.

Comments closed

Secure Azure Data Source Access from Databricks

Bhavin Kukadia, Abhinav Garg, and Michal Marusan show us the right way to access Azure data sources from Azure Databricks:

Enterprise Security is a core tenet of building software at both Databricks and Microsoft, and thus it’s considered as a first-class citizen in Azure Databricks. In the context of this blog, secure connectivity refers to ensuring that traffic from Azure Databricks to Azure data services remains on the Azure network backbone, with the inherent ability to whitelist Azure Databricks as an allowed source. As a security best practice, we recommend a couple of options which customers could use to establish such a data access mechanism to Azure Data services like Azure Blob StorageAzure Data Lake Store Gen2Azure Synapse Data WarehouseAzure CosmosDB etc. Please read further for a discussion on Azure Private Link and Service Endpoints.

This is more about network configuration rather than things like “store your credentials and other secrets in Azure Key Vault,” which is also a good idea.

Comments closed

Real-Time Replay with WorkloadTools

Gianluca Sartori shows us how to perform a real-time replay with WorkloadTools:

Before we jump to how, I’d better spend some words on why a real-time replay is needed.

The main reason is the complexity involved in capturing and analyzing a workload for extended periods of time. Especially when performing migrations and upgrades, it is crucial to capture the entire business cycle, in order to cover all possible queries issued by the applications. All existing benchmarking tools require to capture the workload to a file before it can be analyzed and/or replayed, but this becomes increasingly complicated when the length of the business cycle grows.

I’m not sure how frequently I’d use real-time replays, but it’s nice to know that it’s pretty easy to pull off with WorkloadTools.

Comments closed

Approximate Distinct Count with DAX

Gilbert Quevauvilliers runs some performance tests against the approximate distinct count formula in DAX:

I am currently running SQL Server Analysis Services (SSAS) 2019 Enterprise Edition. (This can also be applied to Power BI)

My Fact table has got roughly 950 Million rows stored in

And as mentioned previously it has got over 64 Million distinct users.

The data is queried from SQL Server into SSAS.

Gilbert first checks how close these are and then how much faster the approximate count is.

Comments closed

Setting Up a SQL Server Lab with AutomatedLab

Jess Pomfret looks at a very interesting Powershell module:

There is a fantastic PowerShell module called AutomatedLab that can enable you to easily build out a lab for the  specific scenario you need to test. Even better is the module comes with 70 sample scripts that you can start with and adapt to meet your needs.

The module gives you the option to work with Hyper-V or VMWare. I will say most of the examples are using Hyper-V, and that is what I’ll be using also.

For my lab I want a SQL Server 2019 instance joined to a domain, and a separate client machine that I can manage the SQL Server from. On the client I would need to be able to connect to the internet as I want to be able to download PowerShell modules from the gallery easily.

It’s about time for me to rebuild my lab, so I’ll need to check that out.

Comments closed

Incremental Imports with Sqoop

Jon Morisi continues a series on Sqoop:

In my last two blog posts I walked through how to use Sqoop to perform full imports.  Nightly full imports with overwrite has it’s place for small tables like dimension tables.  However, in real-world scenarios you’re also going to want a way to import only the delta values since the last time an import was run.  Sqoop offers two ways to perform incremental imports: append and lastmodified.

Both incremental imports can be run manually or created as job using the “sqoop job” command.  When running incremental imports manually from the command line the “–last-value” arg is used to specify the reference value for the check-column.  Alternately sqoop jobs track the “check-column” in the job and the value of the check-column is used for subsequent job runs as the where predicate in the SQL statement.  I.E. select columns from table where check-column > (last-max-check-column-value).

This is where Sqoop starts to break down for me, and Jon lists some of the issues in the post.

Comments closed

Pasting an R Plot into Word

Eran Raviv takes us through converting a plot in R to work with Microsoft Word:

In this post you will learn how to properly paste an R plot\chart\image to a word file. There are few typical problems that occur when people try to do that. Below you can find a simple, clean and repeatable solution.
When you google how to paste a plot from R to a word file you find that there are some solutions. But they are not satisfactory. For example, stackoverflow highest-ranking reply offers to use the Rstudio button to export your plot as an Enhanced Metafile (EMF) format. Couple of things wrong with it: the first is that you need to start messing around with the device scaling, because the export remembers the port dimensions. The second is that the word file is often not the final version. For better readability\representation we often convert the word to a pdf format before sending\publishing. But then you get something funny which you may have seen before, and drove some people insane consumed much of some people’s time, myself included:

Also check out the linked blog post for additional insights into why this happened.

Comments closed

LAST_VALUE() and Windows

Jeet Kainth explains the importance of specifying your window when using LAST_VALUE():

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row.

Click through for the example. Remember that the default is going to the current row, not the entire data set.

Comments closed