Press "Enter" to skip to content

Curated SQL Posts

Trace Flag 834 And Columnstore Tables

Joe Obbish shows how trace flag 834 can solve a bottleneck when inserting into tables with clustered columnstore indexes:

In my experience, when we get into a situation with high memory waits caused by too much concurrent CCI activity all queries on the server that use a memory grant can be affected. For example, I’ve seen sp_whoisactive run for longer than 90 seconds.

It needs to be stated that not all CCIs will suffer from this scalability problem. I was able to achieve good scalability with some artificial tables, but all of the real target tables that I tested have excessive memory waits at high concurrency. Perhaps tables which require more CPU to compress naturally spread out their memory requests and the underlying OS is better able to keep up.

Read the whole thing, and also check out Lonny Niederstadt’s comment as it adds pertinent information about TF834.

Comments closed

Avoding Direct View() Calls In R

John Mount notes that you should not assume that the View() function in R will work:

R tip: get out of the habit of calling View() directly.

View() only works correctly in interactive environments, not currently in RMarkdown contexts. It is better to call something else that safely dispatches to View(), or to something else depending if you are in an interactive or non-interactive session.

Click through for a script which is safe to run whether you’re in R Studio or using knitr to build a document.

Comments closed

Recommendations For Running Kafka On AWS

Prasad Alle has some recommendations if you decide to run Apache Kafka on AWS:

The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.

There’s some good advice here, as well as acknowledgement of various tradeoffs involved in architecting a solution.

Comments closed

Module Signing For Database Rights

Solomon Rutzky shows how to use module signing to grant granular permissions to users:

Scenario: We want to allow one or more Users and/or Database Roles to be able to truncate certain Tables, but not all Tables. We certainly do not want to allow anyone the ability to make structural changes to the Table.

Also, it is likely that, over time, at least one more Tables will be added that the User(s) and/or Role(s) should be able to truncate, and less likely, though not impossible, that one or more tables that they should be able to truncate now might be removed.

Truncation is a great example of the kind of right you’d want behind a signed stored procedure, as the level of right necessary to truncate a table is absurd:  practically full control of the table.  Module signing is something that I wish more DBAs knew and implemented.

Comments closed

Executing R Scripts In SSRS

Tomaz Kastrun shows how to include R scripts (and visuals) in SQL Server Reporting Services:

Using the privileges of R language to enrich your data, your statistical analysis or visualization is a simple task to get more out of your reports.

The best practice to embed R code into SSRS report is to create stored procedure and output the results to report. To demonstrate this, we will create two reports; one that will take two input parameters and the second one to demonstrate the usage of R visualization.

It’s nice to be able to use R to create nice visuals and then import them in your SSRS report, and Tomaz shows how.

Comments closed

Explaning RANKX In DAX

Philip Seamark explains how the RANKX function works:

One of the first traps to encounter when using this function is the function can be used in calculations for calculated columns as well as calculated measures.   The RANKX function will still do what it is asked.  The trick is how you use the function in each scenario – and more importantly, what filters are going to be implicitly applied to the data the RANKX function actually uses.

This is a helpful post for explaining the ranking function.

Comments closed

All Execution Plans Are Estimates

Grant Fritchey drops a bomb on us:

All these resources, yet, for any given query, all the plans will be identical (assuming no recompile at work). Why? Because they’re all the same plan. Each and every one of them is an estimated plan. Only an estimated plan. This is why the estimated costs stay the same between an estimated and actual plan, this despite any disparity between estimated and actual row counts.

I’ve blogged about this before, but it’s worth mentioning again. There are a only a few minor differences between an estimated plan and an actual plan. It’s all about the data set. What’s going on is that an actual plan can capture query metrics, which are then appended to the estimated plan. At no point is any different plan generated during this process. It’s just a plan, an estimated plan, or, it’s a plan plus query metrics.

Read the whole thing.

Comments closed

Getting A Random Row

Brent Ozar shares four methods for getting a random row from a table:

Method 1, Bad: ORDER BY NEWID()

Easy to write, but it performs like hot, hot garbage because it scans the entire clustered index, calculating NEWID() on every row:

That took 6 seconds on my machine, going parallel across multiple threads, using tens of seconds of CPU for all that computing and sorting. (And the Users table isn’t even 1GB.)

Click through for the other three methods.  The really tricky part is when you want to get a random sample from the table, as TABLESAMPLE is an awful choice for that.

Comments closed

Order Of Operations With Logical Types

Thomas Rushton explains the order of operations, particularly around boolean operators:

The order in which calculations are done – not just reading from left to right, but remembering that things like multiplication and division happen before addition and subtraction. My son tells me that kids nowadays are being taught something called “BIDMAS” – which stands for “Brackets, Indices, Division, Multiplication, Addition, Subtraction”. Or it can be BODMAS – Brackets, Operations, Division… (Operation is a fancy new way of describing indices – ie xy)

Unsurprisingly, there are similar rules for Boolean operators.

It’s a valuable lesson oft learned.

Comments closed

Using Extended Properties For Documentation

Phil Factor shows us how we can use Extended Properties to build database documentation:

Once you’ve got into the habit of using Extended Properties to document your database, there are obvious benefits:

  • You can explain why you added that index or modified that constraint.
  • You can describe exactly what that rather obscure column does.
  • You can add a reasoned explanation to the use of a table.

You will often need these explanations because, sadly, DDL code isn’t ‘self-documenting’, and human memory is fallible. Extended Properties are easily searched because they are all exposed in one system view.

It is great to add explanations to lists of procedures, functions and views once the database becomes sizeable. Extended Properties are useful when exploring the metadata, but the requirement isn’t quite so essential because comments are preserved along with the source code. Tables, however, are a big problem because SQL Server throws away the script that produces the table, along with all the comments. The reason that this happens is that there are many ways you can alter parts of a table without scripting the entire table. How could one infallibly preserve all these ALTER statements in the preserved script? It’s tricky. Table scripts that you get from SSMS or script via SMO are therefore synthesised from the system tables but without those comments or even Extended Properties.

Extended Properties are useful, but I think the lack of tooling around them prevented widespread adoption.  Now that there are a few tools which support them (including SSMS’s data classification mechanism), I wonder if these will get a second look.

Comments closed