Press "Enter" to skip to content

Curated SQL Posts

Analyzing Data Professional Salary Data

Ginger Grant has built a dashboard to analyze data professional salaries:

In the survey for 2018, the people who made the most money were from Hong Kong with an average salary of $263,289.  Before you start planning on moving, you will might want to look at the data a little closer.  There were 2 people who responded from Hong Kong.  One of them said he was making over 1.4 million dollars, the highest amount reported in the survey.  Given the fact that we only have two responses from Hong Kong, we will be unable to draw a definitive conclusion with 2 records. To be able to answer that question, more analysis will need to be done on the location and salary information and you will probably want to add market basket criteria because a dollar say in Hong Kong doesn’t go as far as the average apartment rental is $3,237 a month as it does say in Uganda where the rent is around $187 a month.

Click through to see the final product and grab a copy of her dashboard.

Comments closed

OLTP-Friendly Database Deployments

Michael Swart looks at one of the biggest problems when trying to do a zero-downtime deployment to an OLTP system:

There are two main kinds of SQL queries. SELECT/INSERT/UPDATE/DELETE statements are examples of Data Manipulation Language (DML). CREATE/ALTER/DROP statements are examples of Data Definition Language (DDL).

With schema changes – DDL – we have the added complexity of the SCH-M lock. It’s a kind of lock you don’t see with DML statements. DML statements take and hold schema stability locks (SCH-S) on the tables they need. This can cause interesting blocking chains between the two types where new queries can’t start until the schema change succeeds

Click through for suggestions with regard to schema locks, as well as a few tips for modifying large tables.

Comments closed

DBA Salary Calculations

Eugene Meidinger takes a whack at the data professional salary survey:

So I’m using something called a multiple linear regression to make a formula to predict your salary based on specific variables. Unfortunately, the highest Coefficient of Determination (or R2) I’ve been able to get is 0.37. Which means, as far as I understand it, that at most the model explains 37% of the variation.

Additionally the spread on the results isn’t great either. The standard deviation, a measure of spread, is about $25,000 on the original subset of data. Which means we’d expect 68% to be within +/- $25,000 of the average and 95% to be within +/- $50,000 of the average. So what happens when we apply our model?

Read on for Eugene’s early findings and a roadmap for additional posts.

Comments closed

Service Broker Security

Misha Capewell has a quick synopsis of the two flavors of service broker security:

There are 2 types of security in Service Broker: dialog and transport. Dialog security establishes a secure, authenticated connection between Service Broker Services or dialog endpoints. Transport security establishes an authenticated network connection between SQL Server instances or Service Broker endpoints. Clear as mud, right? Don’t worry, these are easily mixed up by both novice and experienced Service Broker admins. To illustrate, let’s go back to our taxes scenario. You’ve completed your forms, stamped your envelope and you’re ready to mail it in. You drop it in your nearest mailbox and what happens next? A postal worker will pick it up, it gets loaded into a truck and shipped between various sorting facilities (as you might have noticed I have no clue how the USPS works) until it is finally delivered to the IRS via yet another postal worker. Now, those postal workers all have the authority to transport your tax return from point to point. However, they do not have the authority to open up and read your return. That’s what transport security is. The IRS agent on the other end, though, he does have the authority to read your return. That’s dialog security.

Click through for more details, including a couple of scripts to help configure each form of security.

Comments closed

Using rquery To Speed Up Data Manipulations

John Mount shows off some rquery benchmarks versus dplyr and data.table:

Let’s take a look at rquery’s new “ad hoc” mode (made convenient through wrapr‘s new “wrapr_applicable” feature). This is where rquery works on in-memory data.frame data by sending it to a database, processing on the database, and then pulling the data back. We concede this is a strange way to process data, and not rquery’s primary purpose (the primary purpose being generation of safe high performance SQL for big data engines such as Spark and PostgreSQL). However, our experiments show that it is in fact a competitive technique.

We’ve summarized the results of several experiments (experiment details here) in the following graph (graphing code here). The benchmark task was hand implementing logistic regression scoring. This is an example query we have been using for some time.

There are some nice early results, so it’ll be interesting to watch as this develops.

Comments closed

Tidytext 0.1.6

Julia Silge announces a new version of tidytext:

I am pleased to announce that tidytext 0.1.6 is now on CRAN!

Most of this release, as well as the 0.1.5 release which I did not blog about, was for maintenance, updates to align with API changes from tidytext’s dependencies, and bugs. I just spent a good chunk of effort getting tidytext to pass R CMD check on older versions of R despite the fact that some of the packages in tidytext’s Suggests require recent versions of R. FUN TIMES. I was glad to get it working, though, because I know that we have users, some teaching on university campuses, etc, who are constrained to older versions of R in various environments.

There are some more interesting updates. For example, did you know about the new-ish stopwords package? This package provides access to stopword lists from multiple sources in multiple languages. If you would like to access these in a list data structure, go to the original package. But if you like your text tidy, I GOT YOU.

Read on for examples and grab the latest version.

Comments closed

Visual Principles

I have a post looking at three visual principles important to creating good dashboards:

In European languages, we read from left to right and from top to bottom.  In Middle Eastern languages like Hebrew and Arabic, we read from right to left and top to bottom.  In ancient Asian languages (particularly Chinese), we read from top to bottom and right to left, but in modern Chinese, we read left to right and top to bottom.  As far as Japanese goes, we read every which way because YOLO.  The way we read biases the way we look at things.

There has been quite a bit of research done on looking at where we look on a screen or on a page. I’m going to describe a few layouts, but focusing on research done on Europeans.  If you poll a group of Israeli or Saudi Arabian readers, flip the results.

Read the whole thing.  The second part of that comes out soon.

Comments closed

Welcome, CXCONSUMER

Erik Darling points out that CXCONSUMER is now a wait type in SQL Server:

According to Pedro’s slide, but not the ENTIRELY MISSING DOCUMENTATION, this wait is the “safe” type of parallelism wait.

It’s a good thing Pedro is a dutiful blogger, so we don’t have to pull our hair out while unfurling these mysteries.

Speaking of documentation, our new CXCONSUMER friend isn’t mentioned in Query Store Wait Stats, either.

This is a very useful addition.

Comments closed

Performance Testing Post-Updates

Joe Chang has some quick and dirty performance tests from SQL Server 2016 SP1 compared to SQL Server 2106 SP1 CU7 (the first post-Meltdown/Spectre release):

linear sum, SQL Server 2016 sp1 cu 7 bld 4466 and OS patches vs sp1 base
9% faster, 12% more CPU efficiently
individual queries range from 24% faster to 0.3% slower.
there probably is a penalty in the recent fixes, but fixes since SP1 also made improvements?

Click through for more details.  We’ll have to see a lot more testing to know, but that’s certainly not awful.

Comments closed