Press "Enter" to skip to content

Author: Kevin Feasel

Choosing Clustered Index Columns

Ed Elliott wades into the clustered index debate:

I have seen this debated in forums spread over the internet for decades, and the advice that we gave ten years ago isn’t as valid today as it was then. Ten years ago, memory was considerably less, and disks were spinning rust. The advent of SSD’s and the ability to get servers with more memory than data, even on large systems have changed how we should think about designing and maintaining databases.

I generally subscribe to the NUSE philosophy: Narrow, Unique, Static, Ever-Increasing. That generally leads me to selecting identity integers or longs. For junction tables (whose entire purpose is to join two tables together and which never get referenced outside of that), I use the primary key as the clustered index.

In extreme insert scenarios, I can see wanting to maximize fragmentation in order to insert into more pages in the B-tree and avoid hot spot pages.

Comments closed

Finding High-Resource Queries with Extended Events

Grant Fritchey shows how you can create an extended event which identifies high-CPU queries:

A question that comes up on the forums all the time: Which query used the most CPU. You may see variations on, memory, I/O, or just resources in general. However, people want to know this information, and it’s not readily apparent how to get it.

While you can look at what’s in cache through the DMVs to see the queries there, you don’t get any real history and you don’t get any detail of when the executions occurred. You can certainly take advantage of the Query Store for this kind of information. However, even that data is aggregated by hour. If you really want a detailed analysis of which query used the most CPU, you need to first set up an Extended Events session and then consume that data.

Click through for the script.

Comments closed

Power Query List Expansion Problems

Chris Webb goes over an issue with an attempt to expand out a set of folders in M:

The approach I took was the one that seemed natural to me at the time:

1. Use the Folder data source to connect to the folder containing the image files
2. Define a function called SplitText that takes a long piece of text and splits it up into a list of text values no longer than 30000 characters
3. Call the function once per row on the table returned by step (1)
4. Use the Expand/Aggregate button to expand the new column created by step (3) and get a table with one row for each of the split-up text values

When I ran this query, though, I caught sight of something that is every Power Query developer’s worst nightmare:

Read on for more. Also, drop by to congratulate Chris on collecting a blue badge.

Comments closed

Securing Power BI

Andy Jones has 10 tips for securing your Power BI infrastructure:

9 Sharing Externally

Power BI offers the ability to share reports outside of your organisation or even publish to the public internet. If this causes you to shudder, turn these features off. Your Power BI admin (remember them from above) should open the admin portal and move a slider – problem solved.

Turn off sharing externally (unless needed)

Click through for the full list.

Comments closed

Using Notebooks with ElasticMapReduce

Vignesh Rajamani and Nikki Rouda show off ElasticMapReduce Notebooks:

One of the useful features of EMR Notebooks is the separation of the notebook environment from your underlying cluster infrastructure. The separation makes it easy for you to execute notebook code against transient clusters without worrying about deploying or configuring your notebook infrastructure every time you bring up a new cluster. You can create multiple serverless notebooks from the AWS Management Console for EMR and access the notebook UI without spending time setting up SSH access or configuring your browser for port-forwarding. Each notebook you create is launched instantly with its own Spark context. This capability enables you to attach multiple notebooks to a single shared cluster and submit parallel jobs without fear of job conflicts in a multi-tenant environment. This way you make efficient use of your clusters.

You can also connect EMR Notebooks to an EMR cluster as small as a one node. This gives you a budget-friendly sandbox environment to develop your Spark application.

Notebooks are everywhere. And for good reason.

Comments closed

An RStudio Configuration

William Doane has published a sample RStudio configuration:

Whenever I need to install RStudio on a new machine, I have to think a bit about the configuration options I’ve tweaked. Invariably, I miss a checkbox that leaves me with slightly different RStudio behavior on each system. This post includes screenshots of my RStudio configuration and custom keyboard shortcuts for RStudio 1.3, MacOS, so that I have a reference.

I like these kinds of posts because they can help you find interesting settings you might not otherwise know about. Also, I second the FiraCode recommendation for R as well as F#. The only reason I don’t use it more is because I don’t want to confuse people during presentations. H/T R-Bloggers

Comments closed

The Joy of Non-Nullable Persisted Computed Columns

Louis Davidson shows what you can do with persisted, non-nullable computed columns:

Next, let’s add a check constraint our computed column. For this example, we are just going to make sure that the value in the table is a palindrome (because this is something that every data architect has come across at least one in their life, right?). So Value = REVERSE(Value);

Read on for more fun and sometimes-useful things you can do.

Comments closed

Detecting and Analyzing Deadlocks

Max Vernon has a couple scripts to analyze deadlocks in SQL Server:

Deadlock detection and analysis in SQL Server is important for the overall health of affected applications. This post shows how to setup an Extended Events session to capture deadlock details. It also provides a stored procedure to capture details from the Extended Events session. The stored procedure enables simplified permanent storage of those deadlock detection events. Next, we’ll set up a SQL Server Agent Job to run the stored procedure on an ongoing basis. Finally, we’ll see several examples of how to query the captured events. These queries support making the necessary changes to both the application and database design.

Click through for a description of what a deadlock is as well as scripts to help find and fix them.

Comments closed

Generating Scripts from SSMS

Jeff Mlakar shows how you can use Management Studio to generate scripts for database objects:

Sales.SalesOrderDetail looks like a good choice. Let’s generate a script for that table, all associated objects, and its data.

The safest way to create structure including all indexes, keys, defaults, constraints, dependencies, triggers, etc. is to use SSMS Generate Scripts.

I would also recommend becoming familiar with the Powershell command to generate scripts and what dbatools has on store.

Comments closed

Stats IO Oddities

Josh Darnell collects a few cases where SET STATISTICS IO ON doesn’t behave quite as you might expect:

The first one comes from a post on Database Administrators Stack Exchange: STATISTICS IO for parallel index scan

To summarize the situation, the OP had a query that was scanning a clustered index. They were seeing significantly higher numbers reported in the logical reads portion of the STATISTICS IO output when the query ran in parallel vs. serially (with a MAXDOP 1 query hint). There is a demo of this behavior in the post, so I won’t reproduce it here.

There are several interesting cases in here, so check them out.

Comments closed