Press "Enter" to skip to content

Curated SQL Posts

SQLSharp 4.1 Released

Solomon Rutzky announces a new release of SQL#:

  • GENERAL

    • Greatly reduced size (by approx. 310 kb) of main SQL# Assembly by moving LookUp category into its own Assembly: SQL#.LookUps. This will improve initial load times and won’t waste much memory when not using the LookUp functions.
  • Installation Script

    • Account for security changes related to SQL Server 2017 (i.e. “CLR strict security“) using a Certificate (flexible, clean) instead of the new “Trusted Assemblies” (inflexible, messy).
  • Networking

    • Added explicit support for TLS 1.1 and TLS 1.2 protocols

    • Increased default “Connection Limit” for URIs to 20 from the .NET default of 2. This will reduce performance bottlenecks from concurrent access to the same URI.

SQL# is also the first place where I found a decent median function for SQL Server.

Comments closed

Tracking DBCC MEMORYSTATUS Over Time

Andy Galbraith has a script to track the outputs of DBCC MEMORYSTATUS over a time period:

Running this statement interactively doesn’t return any data – it just loads the data into DBADatabase.dbo.DBCCMemoryStatus.  Running the commented-out SELECT at the bottom of the script as written will query that table for all rows of counter VM Reserved (virtual memory reserved) but there is much more data than that available if you modify the SELECT.

This query can be dropped into a SQL Agent job step as is and it will run – just like the interactive run it will create the database and permanent table if they don’t exist and then store those nuggets of data into the permanent table for later use – you never know when you may need them!

Click through for the script.

Comments closed

Understanding Row Goals

Paul White has an excellent article on row goals:

In my experience, it is very common for people to miss the impact of row goals in execution plans.

No wonder, since before SQL Server 2017 CU3, there was no documented way to see information about row goals in execution plans! That update includes:

This enhancement adds the EstimateRowsWithoutRowGoal attribute to each plan operator affected by a row goal.

The new attribute is visible in all the usual places (e.g. DMVs, plan cache) but not yet in SQL Server Management Studio graphical plans (up to and including SSMS version 17.4). The SQL Server does send the new attribute, but SSMS graphical plans strip bits out that do not match its local version of the xml plan schema.

Version 17.5 of SSMS is expected to ship with an updated xml schema and UI elements, making the new attribute visible. SentryOne Plan Explorer shows the new attribute in the raw xml view already, since it doesn’t have any reason to strip the information out, but an update will be required to incorporate the new row goal information into other places (such as the Plan Diagram). I would expect it to make its way onto tooltips first.

Definitely worth a careful read.

Comments closed

Getting SQL Server CPU Usage By Session

Manu Punna has a script to get CPU utilization by session over a relatively short timeframe:

Troubleshooting high CPU usage on a SQL Server Database is an art, but there is a defined methodology to follow to find the root cause of high CPU. This can involve breaking down the overall server CPU usage to a more granular level, first discovering that it’s SQL Server that’s the problem (because way too often it’s something else!), down to exploring specific plan operators in a particular problematic query. Finding that problematic query, identifying the high CPU consumer, means identifying the CPU usage by session.

sys.dm_exec_requests shows the CPU time, but it’s cumulative – it doesn’t give the CPU consumption by each session at the current time. You can see how much CPU usage a session has had since it started, but it doesn’t show you what’s going on right now. To explore that, we need to query sys.dm_exec_requests repeatedly, and look for the differences. We need to collect the CPU usage for a time interval to identify the high CPU consumers.

Click through for the script.

Comments closed

New SQL Server 2017 On Windows Docker Container

Perry Skountrianos announces that SQL Server 2017 Developer & Express editions (running Windows Server 1709) are now available on Docker Hub:

Windows Server version 1709 brings the following important improvements that developers can take advantage of with the updated container images.

  1. First of all, the microsoft/windowsservercore image underneath SQL shrunk by more than 2GB, so the SQL Server images are also 2GB smaller.

  2. The networking support for containers was improved to support Kubernetes, now at beta in version 1.9 beta on Windows, and routing mesh with Docker Swarm.

  3. If you want to store your databases on remote storage, you can now by using global SMB mounts (New-SMBGlobalMapping) along with a docker volume (docker run -v c:\shared:c:\data microsoft/mssql-express-…).

Seems like a useful improvement.

Comments closed

Ways To Hinder Indexes

Raul Gonzalez shows that even when you have a good index, “clever” developers and fate can find ways to conspire against it:

he benefits of having an index are well known, you can get the same results by reading a smaller amount of data so the improvement in performance can be from several minutes to seconds or even less.

That sounds awesome and it certainly is and there are people out there making a living of it, so it’s a huge deal for sure.

But it’s not always like that, and things can go wrong very easily and make all these shiny indexes just a pile of useless burden.

Let me show you some examples, where we can see our indexes in use, but also how they can be ignored by the query processor and become totally useless. I’m going to use the Microsoft sample database [WideWorldImporters] so you can follow along if you want.

Read on to learn more.

Comments closed

Fun With Random Walks

Emrah Mete simulates a random walk in R:

Let’s consider a game where a gambler is likely to win $1 with a probability of p and lose $1 with a probability of 1-p.

Now, let’s consider a game where a gambler is likely to win $1 and lose $1 with a probability of 1. The player starts the game with X dollars in hand. The player plays the game until the money in his hand reaches N (N> X) or he has no money left. What is the probability that the player will reach the target value? (We know that the player will not leave the game until he reaches the N value he wants to win.)

The problem of the story above is known in literature as Gambler’s Ruin or Random Walk. In this article, I will simulate this problem with R with different settings and examine how the game results change with different settings.

Click through for the script and analysis.  There’s a reason they call this game the gambler’s ruin.

Comments closed

The Business Value Of Upgrading To Hadoop 3

Roni Fontaine, Vinod Vavilapalli, and Saumitra Buragohain explain some of the business case for upgrading to Hadoop 3 from Hadoop 2:

Hadoop 2 doesn’t support GPUs. Hadoop 3 enables scheduling of additional resources, such as disks and GPUs for better integration with containers, deep learning & machine learning.  This feature provides the basis for supporting GPUs in Hadoop clusters, which enhances the performance of computations required for Data Science and AI use cases.

Hadoop 2 cannot accommodate intra-node disk balancing. Hadoop 3 has intra-node disk balancing. If you are repurposing or adding new storage to an existing server with older capacity drives, this leads to unevenly disks space in each server.   With intra-node disk balancing, the space in each disk is evenly distributed.

Hadoop 2 has only inter-queue preemption across queues. Hadoop 3 introduces intra-queue preemption which goes to the next level time by allowing preemption between application within a single queue. This means that you can prioritize jobs within the queue based on user limits and/or application priority

Read on for more examples.

Comments closed

DataExplorer

Boxuan Cui introduces DataExplorer, an R package dedicated to assist with exploratory data analysis:

According to a Forbes article, cleaning and organizing data is the most time-consuming and least enjoyable data science task. Of all the resources out there, DataExplorer is one of them, with its sole mission to minimize the 80%, and make it enjoyable. As a result, one fundamental design principle is to be extremely user-friendly. Most of the time, one function call is all you need.

Data manipulation is powered by data.table, so tasks involving big datasets usually complete in a few seconds. In addition, the package is flexible enough with input data classes, so you should be able to throw in any data.frame-like objects. However, certain functions require a data.table class object as input due to the update-by-reference feature, which I will cover in later part of the post.

For my money, that number is closer to 90%.  I will have to check this package out.

Comments closed

Using Power BI To Pass Parameterized Values To R

Stacia Varga shows how you can parameterize your R scripts within Power BI:

It’s not difficult, but the cool thing about Power BI is that I can use parameters to dynamically change the report visualization without opening up the script. To do this:

  • Open the Query Editor in Power BI

  • Click Manage Parameters, and then click New Parameter.

  • Set the parameter properties – Name, Type, and Current Value.The Name is how I will reference the parameter my R script, the Type is the data type, and Current Value is the initial value that I want to set (if any).

Click through for an example and more details.

Comments closed