Press "Enter" to skip to content

Author: Kevin Feasel

HASHBYTES Performance In SQL Server

Joe Obbish takes a look at how HASHBYTES doesn’t scale well:

The purpose of the MAX aggregate is to limit the size of the result set. This is a cheap aggregate because it can be implemented as a stream aggregate. The operator can simply keep the maximum value that it’s found so far, compare the next value to the max, and update the maximum value when necessary. On my test server, the query takes about 20 seconds. If I run the query without the HASHBYTES call it takes about 3 seconds. That matches intuitively what I would expect. Reading 11 million rows from a small table out of the buffer pool should be less expensive than calculating 11 million hashes.

From my naive point of view, I would expect this query to scale well as the number of concurrent queries increases. It doesn’t seem like there should be contention over any shared resources, so as long as every query gets on its own scheduler I wouldn’t expect a large degradation in overall run time as the number of queries increases.

Joe’s research isn’t complete, but he does have a conjecture as to why HASHBYTES doesn’t scale well.  That said, the most interesting thing in the post to me was to see Microsoft potentially using bcrypt under the covers for HASHBYTES calculation—if that’s really the case, there actually is a chance that sometime in the future, we’d be able to generate cryptographically secure hashes within SQL Server rather than the MD5, SHA1, and SHA2 hashes we have today.

Comments closed

Securing Power BI Report Server

Steve Hughes gives some advice for securing a Power BI Report Server installation:

You have essentially three layers of access to the report file security in Power BI Report Server.

  1. The portal itself can be secured. You can and should limit access to the reports by only allowing specific users or groups access to the report portal.
  2. Folders can be used to provide more granular security over a group of assets in the report portal. In the image above, I created a folder called PBI Secure Reports. A specific AD group has access to this folder. If a user does not have permissions to the folder, the folder does not show up in the portal and they cannot access the folder or the assets, including Power BI reports, stored in this folder.
  3. Individual reports can be secured as well. I never recommend this option as it becomes administratively difficult to manage. However, the capability is there is a single asset needs to be secured in this fashion.

These options work for any asset stored in the Report Portal and are not limited to Power BI reports.

Power BI Report Server is a different animal from standard Power BI, so securing it will be a bit different as well.

Comments closed

Resizing Azure Managed Instances

Jovan Popovic shows how to resize Azure SQL managed instances with Powershell:

Azure SQL Managed Instance is fully-managed SQL Server Database Engine hosted in Azure cloud. With Managed Instance you can easily add/remove cores associated to the instance and change the reserved size of the instance. You can use PowerShell to easily manage size of the instance and automate this process.

As a prerequisite, you need to have Azure SQL PowerShell libraries to configure Managed Instance. You would need to install Azure RM PowerShell and  AzureRm.Sql module that contains the commands for updating properties of Managed Instance.

Read on for a demo.

Comments closed

Hortonworks Released HDP 2.6.5

Mitra Mohsenian and Roni Fontaine announce Hortonworks Data Platform 2.6.5:

We are excited to make several product announcements including the general availability of :

  • HDP 2.6.5
    • Apache Kafka 1.0
    • Apache Spark 2.3
  • Apache Ambari 2.6.2
  • SmartSense 1.4.5

HDP 2.6.5 is an important release for Hortonworks given it is the first release that enables Apache Kafka 1.0 and Apache Spark 2.3

It looks like Ubuntu 18.04 isn’t supported just yet, but I imagine that’s coming.

Comments closed

Using Burrow To Monitor Kafka

Gaurav Garg shows us how to install and configure Burrow, a tool for monitoring Apache Kafka clusters:

According to Burrow’s GitHub page: Burrow is a Kafka monitoring tool that keeps track of consumer lag. It does not provide any user interface to monitor. It provides several HTTP request endpoints to get information about Kafka clusters and consumer groups. Burrow also has a notifier system that can notify you (via email or at an HTTP endpoint) if a consumer group has met certain criteria.

Burrow is designed in a modular way that separates the work done into multiple subsystems. Below are the subsystems in Burrow.

  • Clusters: This component periodically updates the topic list and the last committed offset for each partition.
  • Consumers: This component fetches the information about consumer groups like consumer lag, etc.
  • Storage: This component stores all the information in a system.
  • Evaluator: Gets information from storage and checks the status of consumer groups, like if it’s consuming messages at a slow rate using consumer lag evaluation rules.
  • Notifier: Requests the status of a consumer group and sends a notification if certain criteria are met via email, etc.
  • HTTP server: Provides HTTP endpoints to fetch information about a cluster and consumers.

This looks like a good tool to hook into an existing monitoring solution.

Comments closed

The Value Of Statistics In SQL Server

Monica Rathbun walks us through the benefits of having statistics on tables in SQL Server:

Statistics are made up of three parts. Each part tells the optimizer important information regarding the make up the table’s data distribution.

Header – Last Time Stats were updated and number of sample rows

Density Vector – Uniqueness of the columns or set of columns

Histogram– Data’s distribution and frequency of distinct values

Let’s look at a Header, Density and Histogram example.

You can read what the statistic are broken down into using DBCC SHOW_STATISTICS. All field definitions are taken from MSDN.

This is from AdventureWorks2016CTP3 sample database, if you want to follow along. Using the Sales. SalesOrderDetail table let’s look the stats and see what we can find out what it shows us.

Read the whole thing.

Comments closed

Positive And Negative Value Validation In Powershell 6

Thomas Rayner points out a cool addition to parameter validation as of Powershell 6:

If you’ve written at least a couple of advanced PowerShell functions, you’re probably no stranger to parameter validation. These are the attributes you attach to parameters to make sure that they match a certain regular expression using [ValidatePattern()], or that when they are plugged into a certain script, that it evaluates to true using [ValidateScript({})]. You’ve probably also used [ValidateRange()] to make sure a number falls between a min and a max value that you specified.

In PowerShell 6, though, there’s something new and cool you can do with ValidateRange. You can specify in a convenient new syntax that the value must be positive or negative.

Read on to see a few examples.

Comments closed

Continuous Integration And Building SSIS Projects

Koos van Strien gives us three methods for building SSIS projects:

First things first

First, set your expectations: you won’t create a one-size-fits-all build task that will build all your project types. Instead, you will split up your builds by project type – essentially just as described in Continuous Integration for BI in VSTS: Splitting Build Steps by Project Type.

Building SSIS projects

With folder and solution structure in place, we’ll explore three ways to build SSIS projects:

  • SSISBuild / SSISDeploy

  • Just-for-build SSIS projects

  • “Build” inside PowerShell

It’s a good post, so check it out if you’re looking at automating SSIS project deployments.

Comments closed

Spreading Out Multi-Server Agent Runs

Tracy Boggiano shows how to distribute SQL Agent job runtimes for multi-server jobs using MSX/TSX:

First, you need to decide how many time blocks or hours you want the jobs to run in.  So let’s start with scenario one where you pick for example four time blocks.  First, you declare a variable with the time block in it and we will feed in the @@SERVERNAME to let determine a value for the time block that server will run.  Then we wrap our code around our time block, our example we will run Index Maintenance for a 12 period spread out for three hours.  Mind you for my index process which I probably should blog about as well I am processing one index at a time have something that BREAKs out of the procedure when it exceeds the time block it is.  So below we run Index Maintenace between start the index maintenance job on a server between the hours 6  PM and 5 AM based on the time block value we got back.

Click through for a sample.

Comments closed

There’s Only One Way To Order

Matthew McGiffen notes that there is only one way to order, and that is to use the ORDER BY clause:

Everyone, at the beginning of their SQL career, get’s told that it is important to include an ORDER BY if they want the results ordered. Otherwise the order in which they are returned is not guaranteed.

But then you run queries a lot of times that don’t need a specific order – and you see that they (at least seem to) come out in the same order every time. You could (almost) be forgiven for thinking you can rely on that.

There was even a question on a Microsoft SQL certification exam a few years ago that asked what the default order was for records returned by a simple SELECT – the answer it was looking for was that it would be according to the order of the clustered index. So you can rely on that – right?

Wrong. The question was a bad question, and the answer was incorrect. Let’s look at this in action.

Order is never guaranteed to be stable unless you specify a unique ordering using ORDER BY.

Comments closed