Press "Enter" to skip to content

Curated SQL Posts

Managing Performance on Azure SQL Managed Instances

Tim Radney has a few considerations for you if you want to start using Azure SQL Managed Instances:

Storage is a bit more difficult to plan and make considerations for, due to having to consider multiple factors. For storage you need to account for the overall storage requirement for both storage size, and I/O needs. How many GBs or TBs are needed for the SQL Server instance and how fast does the storage need to be? How many IOPS and how much throughput is the on-premises instance using? For that, you must baseline your current workload using perfmon to capture average and max MB/s and/or taking snapshots of sys.dm_io_virtual_file_stats to capture throughput utilization. This will give you an idea of what type of I/O and throughput you need in the new environment. Several customers I’ve worked with have missed this vital part of migration planning and have encountered performance issues due to selecting an instance level that didn’t support their workload.

Tim has a lot of good advice in here, so read the whole thing.

Comments closed

Finding Folder Sizes with Powershell

Jeffrey Hicks shows us a quick way to retrieve information on folder sizes:

It is simple enough to run Get-Childitem and pipe the results to Measure-Object.

But this can be often time consuming. The other drawbacks, besides typing is it takes extra work for format the results into something more user friendly. And if i want to include hidden files, I have to remember to use -Force with Get-ChildItem.

Cmdlets are nice and convenient. And I always recommend to beginner or even intermediate scripters, if there is a cmdlet to use over the .NET Framework, use the cmdlet. But there are always exceptions and as you grow in expertise, you’ll realize there are times when going directly to the .NET Framework is a better choice. Which is what I have done.

Click through for the solution.

Comments closed

Streaming Pipelines in AWS with Flink and Kinesis Data Analytics

Steffen Hasumann shows us how to put together a streaming ETL pipeline in AWS using Apache Flink and Amazon Kinesis Data Analytics:

The remainder of this post discusses how to implement streaming ETL architectures with Apache Flink and Kinesis Data Analytics. The architecture persists streaming data from one or multiple sources to different destinations and is extensible to your needs. This post does not cover additional filtering, enrichment, and aggregation transformations, although that is a natural extension for practical applications.

This post shows how to build, deploy, and operate the Flink application with Kinesis Data Analytics, without further focusing on these operational aspects. It is only relevant to know that you can create a Kinesis Data Analytics application by uploading the compiled Flink application jar file to Amazon S3 and specifying some additional configuration options with the service. You can then execute the Kinesis Data Analytics application in a fully managed environment. For more information, see Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications and the Amazon Kinesis Data Analytics developer guide.

Click through for the walkthrough.

Comments closed

Using Sqoop to Move Data into Hive

Jon Morisi continues a series on Sqoop:

Sqoop completes the import task by running MapReduce jobs importing the data to HDFS, and then running Hive commands (CREATE TABLE / LOAD DATA INPATH) to move the data to Hive.  The default HDFS location is: /user/[login]/[TABLENAME].  If you have any issues during the import you may need to remove the HDFS directory prior to re-running, or else you will get an Error:

Read on for sample calls and additional notes.

Comments closed

Receiving Notifications when Azure Function Apps Fail

Gilbert Quevauvilliers shares how to receive notification e-mails when an Azure Function App fails:

Below are the steps to enable error notifications on Azure Function Apps

Follows on from my previous blog post How you can store All your Power BI Audit Logs easily and indefinitely in Azure, where every day it extracts the Audit logs into Azure Blob storage. One of the key things when working with any job that runs, is that I want to know when the job fails. If I do not have this and I assume that the data is always where, I could fall into a situation where there is missing data that I cannot get back.

Below explains how to create an alert with a notification email if an Azure Function App fails.

Read on for the step-by-step instructions.

Comments closed

Power BI and Tabular Model Relationship Types

Marco Russo takes us through the different types of relationships we might encounter in Power BI and Analysis Services Tabular models:

relationship can be strong or weak. In a strong relationship the engine knows that the one-side of the relationship contains unique values. If the engine cannot check that the one-side of the relationship contains unique values for the key, then the relationship is weak. A relationship can be weak either because the engine cannot ensure the uniqueness of the constraint, due to technical reasons we outline later, or because the developer defined it as such.

A weak relationship is not used as part of table expansion. Power BI has been allowing composite models since 2018; In a composite model, it is possible to create tables in a model containing data in both Import mode (a copy of data from the data source is preloaded and cached in memory using the VertiPaq engine) and in DirectQuery mode (the data source is only accessed at query time).

There is quite a bit of useful information in here.

Comments closed

Understanding Azure SQL Database Elastic Jobs

Kate Smith takes us through some important concepts around Elastic Jobs in Azure SQL Database:

It is very important that the T-SQL scripts being executed by Elastic Jobs be idempotent.  This means that if they are run multiple times (by accident or intentionally) they won’t fail and won’t produce unintended results. If an elastic job has some side effects, and gets run more than once, it could fail or cause other unintended consequences (like consuming double the resources needed for a large statistics update).  One way to ensure idempotence is to make sure that you check if something already exists before trying to create it.

This takes some getting used to, but once you’re in the habit, you are much better off. Read on for more details on other key concepts.

Comments closed

Monitoring Availability Groups

Nisarg Upadhyay gives us some of the low-down on monitoring availability groups:

In my previous articles, I have explained the step-by-step process of deploying an AlwaysOn Availability group on SQL Server 2017. In this article, I am going to explain how to monitor AlwaysOn availability groups.

First, let’s review the configuration of the availability group we had deployed previously. To do that, open SQL Server Management Studio  Expand database engine from the object explorer  Expand “AlwaysOn High Availability”  Expand “Availability Groups.” You can see the availability group named SQLAAG. Under this availability group (SQLAAG), you can see the list of availability replicas, availability databases, and availability group listeners.

Click through for some tooling built into SQL Server Management Studio, as well as relevant Perfmon counters.

Comments closed

Entering Data into Power Query from Excel

Ed Hansberry shows a quick way to hand-enter some data into Power Query from Excel:

One of the cool things about Power BI is you have a nice “Enter Data” button on the home ribbon where you can quickly add data to your model. The table isn’t dynamic, but sometimes you just need a small table to finish your model to add descriptions or some other bit of data without creating another table in a source that needs to be connected to. Enter Data is perfect for this.

It did take a little bit of trickery to accomplish, but it’s pretty easy to do.

Comments closed

Reading CPU Measures from the Ring Buffer

Taiob Ali explains what the CPU and memory measures are from the scheduler monitor ring buffer:

Here is a sample output of XML from sys.dm_os_ring_buffers where WHERE ring_buffer_type = N’RING_BUFFER_SCHEDULER_MONITOR’. What do those XML elements mean? In order to monitor CPU usages, you need to understand what each element means so you can use the values. I will explain each one in this blog post.

Read on for the list and what each means.

Comments closed