Press "Enter" to skip to content

Author: Kevin Feasel

Reducing TempDB Usage With Memory-Optimized Objects

Mark Wilkinson shows how to replace temp tables (or table variables) with memory-optimized table variables to reduce tempdb latching:

If all worked, you should now see that we have contention on the sysschobjs table. Earlier we discussed using sp_help to get index details on system tables, if we do that now and look at index 2, we will see the lead column is nsclass which is a tinyint field. Using a tinyint as a lead column is typically a terrible idea since there is little selectivity on such a narrow field, and this is no exception.

This isn’t the only case of contention you might see with system objects related to temporary tables. We ran into a few different contention scenarios with tempdb:

  • Contention on sysschobjs again, but on index 3. This index leads with the name of the temporary table and is fairly narrow so you can fit a lot of records on a single index page. Because of this, if you are running lots of concurrent procedures that create temporary tables with the same or similar names, it creates a hot spot on a single page, leading to more contention.

  • Temporary table auto-stats. Statistics objects for all tables (including temporary tables) are stored in the sys.sysobjvalues table. If you get enough auto-stats generations on temporary tables you can see contention here.

Mark’s post reads like a book chapter and he does a great job of summing up the problem and the solution.

Comments closed

Database Code Analysis

William Brewer has an interesting article on performing code analysis on database objects:

In general, code analysis is not just a help to the individual developer but can be useful to the entire team. This is because it makes the state and purpose of the code more visible, so that it allows everyone who is responsible for delivery to get a better idea of progress and can alert them much earlier to potential tasks and issues further down the line. It also makes everyone more aware of whatever coding standards are agreed, and what operational, security and compliance constraints there are.

Database Code analysis is a slightly more complicated topic than static code analysis as used in Agile application development. It is more complicated because you have the extra choice of dynamic code analysis to supplement static code analysis, but also because databases have several different types of code that have different conventions and considerations. There is DML (Data Manipulation Language), DDL (Data Definition Language), DCL (Data Control Language) and TCL (Transaction Control Language).  They each require rather different analysis.

William goes on to include a set of good resources, though I think database code analysis, like database testing, is a difficult job in an under-served area.

Comments closed

Calculating Relative Risk In T-SQL

Mala Mahadevan explains how to calculate relative risk using T-SQL:

In this post we will explore a common statistical term – Relative Risk, otherwise called Risk Factor. Relative Risk is a term that is important to understand when you are doing comparative studies of two groups that are different in some specific way. The most common usage of this is in drug testing – with one group that has been exposed to medication and one group that has not. Or , in comparison of two different medications with two groups with each exposed to a different one.

Read on for an example of a statistical formula calculation which might actually be easier in T-SQL than R.

Comments closed

Random Forests In scikit-learn

Mark Needham shows how easy it is to create a random forest model in Python using scikit-learn:

As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.

Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.

I wasn’t really sure how to work out if that theory was true or not, but by chance, I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!

There’s a nagging voice in my head saying “Principal Component Analysis” as I read this post.

Comments closed

Comparing Spark Streaming, Flink, And Kafka Streams

Shivangi Gupta contrasts three streaming technologies:

Flink and Spark are in-memory databases that do not persist their data to storage. They can write their data to permanent storage, but the whole point of streaming is to keep it in memory, to analyze current data. All of this lets programmers write big data programs with streaming data. They can take data in whatever format it is in, join different sets, reduce it to key-value pairs (map), and then run calculations on adjacent pairs to produce some final calculated value. They also can plug these data items into machine learning algorithms to make some projection (predictive models) or discover patterns (classification models).

Streaming has become the product-level battleground in the Hadoop ecosystem, and it’s interesting to see the different approaches that different groups have taken.

Comments closed

Hints In Oracle Versus SQL Server

Kellyn Pot’Vin-Gorman shows an example of query hints in Oracle and in SQL Server:

Oracle hints were quite common during the infancy of the Oracle Cost Based Optimizer, (CBO).  It could be frustrating for a database administrator who was accustomed to the Rules Based Optimizer, (rules, people!  If there’s an index, use it!) to give up control of performance to a feature that simply wasn’t taking the shortest route to the results.  As time passed from Oracle 9i to 10g, we harnessed hints less, trusting the CBO and by Oracle 11g, it started to be frowned upon unless you had a very strong use case for hinting.  I was in the latter scenario, as my first Oracle 11g database environment required not just new data, but a new database weekly and a requirement for me to guarantee performance.  I knew pretty much every optimal plan for every SQL statement in the systems and it was my responsibility to make sure each new database chose the most optimal plan.  I had incorporated complex hints, (and then profiles as we upgraded…)

With the introduction of database version Oracle 12c, it became a sought after skill to use hints effectively again, as many new optimizer features, (often with the words “dynamic” or “automated” in them) started to impact performance beyond what was outside the allowable.

Read on for a nearly-equivalent query in the two database systems.

Comments closed

Switching In Identity Columns

Kenneth Fisher shows a way of working around the difficulty of adding an identity column to an existing table:

A friend had an interesting problem today. A really big table (multiple millions of rows) and no primary key. He then ran into an issue where he had to have one. The easiest thing is to create a new int column with an identity column right? Unfortunately in this case because of the size of the table, the log growth on adding an identity column was too much. So what to do?

Well, it would be nice if we could add an int column, populate it in chunks, then make it an identity column. Unfortunately, you can’t add identity to an existing column.

Read on for the answer.

Comments closed

Static Site Generation With Hugo

Steph Locke explains how to build a simple site using Hugo:

This site uses Hugo. Hugo is a “static site generator” which means you write a bunch of markdown and it generates html. This is great for building simple sites like company leafletware or blogs.

You can get Hugo across platforms and on Windows it’s just an executable you can put in your program files. You can then work with it like git in the command line.

Read on for a step-by-step process to get started.  Steph also links to blogdown, which is an interesting R-friendly extension.

Comments closed

Using Buffer Pool Extension

William Wolf talks about Buffer Pool Extension:

With this feature, SQL Server will extend the Buffer Pool Cache to non-volatile(ssd) storage.  This will alleviate the I/O contention of mechanical disks by augmenting memory.  The BPE uses the SSD as memory extension rather than disk.  This feature can be used with standard and enterprise, but would provide noticeable benefits for Standard Edition.  According to books online, the BPE size can be up to 32 times(Enterprise) or 4 times(Standard Edition) the value of max_server_memory, but the recommended ratio is 1:16 or less.

By utilizing this option, we can alleviate some memory pressure.  To demonstrate this for me was a litte difficult at first. My laptop, as most newer laptops, has a SSD. So I plugged in a SATA hard drive externally and moved my database there for testing. If the database files are already on SSD, adding BPE may not give much benefit as the memory from BPE would write to SSD as well.

Buffer Pool Extension did end up in the Hall of Shame, but scenarios like Wolf describes exist, and in those scenarios, BPE could be a viable third-best option.

Comments closed

A Filesystem For DMVs

Anthony Nocentino shows how Microsoft is embracing the Linux style by creating a view of DMVs as a filesystem:

Something isn’t right…as DBAs we think of things in rows and columns. So we’re going to count across the top and think the 7th column is going to yield the 7th column and it’s data for each row, right? Well, it will but data processed by awk is whitespace delimited by default and is processed row by row. So the 7th column in the second line isn’t the same as the output in the first line. This can be really frustrating if your row data has spaces in it…like you know…dates.
So let’s fix that…the output from the DMVs via dbfs is tab delimited. We can define our delimiter for awk with -F which will allow for whitespaces in our data. Breaking the data only on the tabs. Let’s hope there isn’t any tabs in our data!

I’m a little surprised that these metrics don’t end up in /proc, but I imagine there’s a reason for that.

Comments closed