The Downside Of Oversizing

Aaron Bertrand shows why you might not want to oversize VARCHAR columns by too much:

Now, whether you go by the old standard or the new one, you do have to support the possibility that someone will use all the characters allowed. Which means you have to use 254 or 320 characters. But what I’ve seen people do is not bother researching the standard at all, and just assume that they need to support 1,000 characters, 4,000 characters, or even beyond.

So let’s take a look at what happens when we have tables with an e-mail address column of varying size, but storing the exact same data:

This is a good argument against automatically using VARCHAR(8000) (much less MAX) when creating columns.

Physical Versus Read-Ahead Reads

Kendra Little explains which SQL Server diagnostic tools include read-ahead reads versus “regular” physical reads:

SQL Server has more than one way to pull pages in from disk for your queries. SQL Server can do a physical read of an 8KB page, or an extent of 8  of those 8KB pages.

SQL Server can also use the “read-ahead” mechanism to pull even larger chunks of data in from disk when you have a query that wants to read a lot of data — because just plucking one 8KB page or even 64KB of pages into disk isn’t super fast when you need lotsa pages.

But these terms get a little confusing when you’re changing between different diagnostic tools in SQL Server, because some of these tools include read-ahead reads in physical reads, and some don’t!

There is some good information here, so read the whole thing.

Memory Grants Affecting Columnstore Load

Denzil Ribeiro explains how memory grant pressure can determine whether a columnstore bulk insert skips the deltastore or not:

We found that only at the beginning of the run, there was contention on memory grants (RESOURCE_SEMAPHORE waits), for a short period of time. After that and later into the process, we could see some latch contention on regular data pages, which we didn’t expect as each thread was supposed to insert into its own row group. You would also see this same data by querying sys.dm_exec_requests live, if you caught it within the first minute of execution, as displayed below.

This is useful in case you run into the issue, but also useful as a case study on effective troubleshooting.

Using OtterTune To Tune Databases

Dana Van Aken, Geoff Gordon, and Any Pavlo show off OtterTune, which uses machine learning techniques to tune database management systems like MySQL and Postgres:

OtterTune, a new tool that’s being developed by students and researchers in the Carnegie Mellon Database Group, can automatically find good settings for a DBMS’s configuration knobs. The goal is to make it easier for anyone to deploy a DBMS, even those without any expertise in database administration.

OtterTune differs from other DBMS configuration tools because it leverages knowledge gained from tuning previous DBMS deployments to tune new ones. This significantly reduces the amount of time and resources needed to tune a new DBMS deployment. To do this, OtterTune maintains a repository of tuning data collected from previous tuning sessions. It uses this data to build machine learning (ML) models that capture how the DBMS responds to different configurations. OtterTune uses these models to guide experimentation for new applications, recommending settings that improve a target objective (for example, reducing latency or improving throughput).

In this post, we discuss each of the components in OtterTune’s ML pipeline, and show how they interact with each other to tune a DBMS’s configuration. Then, we evaluate OtterTune’s tuning efficacy on MySQL and Postgres by comparing the performance of its best configuration with configurations selected by database administrators (DBAs) and other automatic tuning tools.

This is potentially a very interesting technology and is not the only one of its kind—we’ve seen Microsoft enter this space as well for SQL Server index and tuning recommendations.

Making Entity Framework Writes A Little Less Slow

Ilya Chumakov has some tips for making Entity Framework inserts and updates a lot faster:

When adding or modifying a large number of records (10³ and more), the Entity Framework performance is far from perfect. The reasons are architectural peculiarities of the framework, and non-optimality of the generated SQL. Leaping ahead, I can reveal that saving data through a bypass of the context significantly minimizes the execution time.

There’s some good advice in here, though not my favorite advice, which is don’t use Entity Framework.

Computed Column Performance

Paul White has a great article on when computed columns perform poorly:

A major cause of poor performance is a simple failure to use an indexed or persisted computed column value as expected. I have lost count of the number of questions I have had over the years asking why the optimizer would choose a terrible execution plan when an obviously better plan using an indexed or persisted computed column exists.

The precise cause in each case varies, but is almost always either a faulty cost-based decision (because scalars are assigned a low fixed cost); or a failure to match an expanded expression back to a persisted computed column or index.

The match-back failures are especially interesting to me, because they often involve complex interactions with orthogonal engine features. Equally often, the failure to “match back” leaves an expression (rather than a column) in a position in the internal query tree that prevents an important optimization rule from matching. In either case, the outcome is the same: a sub-optimal execution plan.

Definitely read the whole thing if you’re thinking about setting trace flag 176 on.

Automatic Plan Correction

Jovan Popovic introduces a new tuning feature in SQL Server 2017:

How to identify plans that should be corrected?

SQL Server 2017 provides a new system view called sys.dm_db_tuning_recommendations that shows all identified plan regressions. You can select data from this view, find the difference between last known good plan and regressed plan, and the script that you can execute to manually fix regression.

Automatic plan correction

As a next step, you can let SQL Server 2017 to automatically correct any plan that regressed.

I like it when the database engine gets smarter, but I get the feeling I’d like there to be some finer-grained options around what gets considered a regression and when a sub-optimal plan gets swapped out.

Optimizing Kafka

Yeva Byzek explains different tuning options available within Apache Kafka:

Without needing to make any changes to Kafka configuration parameters, you can setup a development Kafka environment and test basic functionality. Yet the fact that Kafka runs straight off the shelf does not mean you won’t want to do some tuning before you go into production. The reason to tune is that different use cases will have different sets of requirements that will drive different service goals. To optimize for those service goals, there are Kafka configuration parameters that you should change. In fact, the Kafka design itself provides configuration flexibility to users, and to make sure your Kafka deployment is optimized for your service goals, you absolutely should investigate tuning the settings of some configuration parameters and benchmarking in your own environment. Ideally, you should do that before you go to production, or at least before you scale out to a larger cluster size.

We have written a white paper to help you identify those service goals, configure your Kafka deployment to optimize for them, and ensure that you are achieving them through monitoring.

Read the whole thing, especially the part about throughput versus latency.

UNION ALL Ordering

Paul White shows how UNION ALL concatenation has changed since SQL Server 2008 R2:

The concatenation of two or more data sets is most commonly expressed in T-SQL using the UNION ALL clause. Given that the SQL Server optimizer can often reorder things like joins and aggregates to improve performance, it is quite reasonable to expect that SQL Server would also consider reordering concatenation inputs, where this would provide an advantage. For example, the optimizer could consider the benefits of rewriting A UNION ALL B as B UNION ALL A.

In fact, the SQL Server optimizer does not do this. More precisely, there was some limited support for concatenation input reordering in SQL Server releases up to 2008 R2, but this was removed in SQL Server 2012, and has not resurfaced since.

It’s an interesting article about an edge case.

Improving Performance For JSON In SQL Server

Bert Wagner shows how to use indexed, non-persisted columns to pre-parse JSON data in SQL Server:

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

It’s great that the database engine is smart enough to do this, but I’m not really a big fan of storing data in JSON and parsing it within SQL Server, as that violates first normal form.  If you know you’re going to use Make as an attribute and query it in SQL, make it a real attribute instead of holding multiple values in a single attribute.


June 2017
« May