Press "Enter" to skip to content

Curated SQL Posts

Expanding LVM Drives

David Klee shows how to expand an LVM drive on Linux:

Next in our SQL Server on Linux series is one important question. On Windows, if you’re about to run out of space, you get your VM admin / storage admin to expand one or more of your drives, and you go to Disk Management and expand the drive with no downtime. How do we accomplish this same task on Linux?

First, SSH into your VM. Get your appropriate system engineer to expand the drive that needs to be expanded. You won’t be able to see it at first in Linux because, just like in Windows, it’ll need to rescan the storage to ‘see’ the extra space. Sometimes Windows does it automatically, and sometimes you have to initiate it manually. In Linux it only does this on system startup.

Let’s grow our data drive from 250GB to 300GB first.

Click through to see how to do that.

Comments closed

Measuring Progress With Power BI

Stacia Varga shows how to use Power BI to simplify data analysis, using the example of New Year’s resolution goals:

First, the actual data represents the accumulation of data by day from the beginning of the year, whereas the target data represents the final tally at the end of a defined period. Each goal has a different frequency: daily, quarterly, and weekly. Currently, the comparison between actual and target data makes it appear that I’m falling way short of my goals. However, even if I were making solid progress on my goals on a daily basis, the comparison of the two values will never meet until the end of the defined period for any given goal. I need a way to prorate the target data so that I can more reasonably measure my progress.

Second, displaying the actual and target values in a table requires me to do mental math to determine how close (or not) I am to achieving my goals. Now, I’m pretty good at mental math, but a better way to see progress is to use data visualizations. I’m sure you’ve heard the saying… A picture is worth a thousand words.

This is a great post if you’re interested in getting started with Power BI.

Comments closed

Row Goals In SQL Server 2017

Erik Darling points out a new bonus when you upgrade to SQL Server 2017 CU3:

Don’t go looking in SSMS just yet. If you get an actual or estimated plan from a query in SSMS, it’s not in the XML.

However, If you get them from the plan cache later, you can see them in the XML.

According to People Much Smarter Than Me®, SSMS strips out XML that it doesn’t recognize, so we’ll have to wait for the next version to drop before we can access it easily.

Erik also has links to get more information.

Comments closed

Fun With Tibbles

Theo Roe provides an introduction to tibbles in R:

Tibbles are a modern take on data frames, but crucially they are still data frames. Well, what’s the difference then? There’s a quote I found somewhere on the internet that decribes the difference quite well;

“keeping what time has proven to be effective, and throwing out what is not”.

Basically, some clever people took the classic data.frame(), shook it til the ineffective parts fell out, then added some new, more appropriate features.

I probably don’t do enough with tibbles, but the upside is that in most cases, there’s a smooth transition.

Comments closed

The Importance Of A Data Computing Layer For Reporting

Buxing Jiang argues that there are reporting scenarios in which building a data computing layer is critical:

In previous articles, we mentioned that most reporting performance issues need to be addressed during the data preparation stage, but many scenarios can’t be handled within the data source. For example, parallel data retrieval should be performed outside of the data source because its purpose is to increase I/O performance. To achieve the controllable buffer, the buffer information needs to be written to an external storage device, which can’t be handled within a data source. The asynchronous data buffering and loading data by random page number in building a list report can’t be handled by a data source. Even for an associative query over multiple datasets that a data source can deal with, it would be necessary to get it done outside the data source when multiple databases or a non-database source is involved and when the database load needs to be reduced. Obviously, these scenarios that are not able to be handled within a data source also can’t be handled by a reporting tool.

I would be concerned about implementation details overwhelming the general value of a data computing layer.

Comments closed

Flexible Active Directory Account Lookup In Powershell

Jana Sattainathan builds a flexible AD lookup cmdlet in Powershell:

Now, the problem is that I have to lookup the AD User account for each of these users to add to the AD group “CrisisManagement_ReadOnly_Group”. Although I still use the AD module command Get-ADUser to lookup names, I have to do so one name at a time like this:

Get-ADUser -Filter ‘(name -like “*David*”) -and (name -like “*Smith*”)’

The above method is tedious and time consuming, especially if it is a long list of users. I would rather paste the list that the sender sent me as is into a PowerShell command and auto-magically add the corresponding accounts to the AD group. If we are unable to find an user, report it as an error.

Click through for the code and more.

Comments closed

Powershell Core On Server Core

Max Trinidad shows how to install Powershell Core on Windows Server Core:

In the following Virtual machine scenario, I got one Server Core with Active Directory (Build 16299) and Windows 10 (Build 16299) joined to my new domain. Both Build 16299.
On my Windows client I create a shared folder named “SharedFiles”, where I copy over the latest MSI version of PowerShell Core “PowerShell-6.0.0-win-x64.msi”.

Then, on the Server Core I’m going to create a map drive to my Windows client shared folder to then run the MSI installation from Windows PowerShell Console.

He also shows how to uninstall Powershell Core, should you wish.

Comments closed

Analyzing Data Professional Salary Data

Ginger Grant has built a dashboard to analyze data professional salaries:

In the survey for 2018, the people who made the most money were from Hong Kong with an average salary of $263,289.  Before you start planning on moving, you will might want to look at the data a little closer.  There were 2 people who responded from Hong Kong.  One of them said he was making over 1.4 million dollars, the highest amount reported in the survey.  Given the fact that we only have two responses from Hong Kong, we will be unable to draw a definitive conclusion with 2 records. To be able to answer that question, more analysis will need to be done on the location and salary information and you will probably want to add market basket criteria because a dollar say in Hong Kong doesn’t go as far as the average apartment rental is $3,237 a month as it does say in Uganda where the rent is around $187 a month.

Click through to see the final product and grab a copy of her dashboard.

Comments closed

OLTP-Friendly Database Deployments

Michael Swart looks at one of the biggest problems when trying to do a zero-downtime deployment to an OLTP system:

There are two main kinds of SQL queries. SELECT/INSERT/UPDATE/DELETE statements are examples of Data Manipulation Language (DML). CREATE/ALTER/DROP statements are examples of Data Definition Language (DDL).

With schema changes – DDL – we have the added complexity of the SCH-M lock. It’s a kind of lock you don’t see with DML statements. DML statements take and hold schema stability locks (SCH-S) on the tables they need. This can cause interesting blocking chains between the two types where new queries can’t start until the schema change succeeds

Click through for suggestions with regard to schema locks, as well as a few tips for modifying large tables.

Comments closed

DBA Salary Calculations

Eugene Meidinger takes a whack at the data professional salary survey:

So I’m using something called a multiple linear regression to make a formula to predict your salary based on specific variables. Unfortunately, the highest Coefficient of Determination (or R2) I’ve been able to get is 0.37. Which means, as far as I understand it, that at most the model explains 37% of the variation.

Additionally the spread on the results isn’t great either. The standard deviation, a measure of spread, is about $25,000 on the original subset of data. Which means we’d expect 68% to be within +/- $25,000 of the average and 95% to be within +/- $50,000 of the average. So what happens when we apply our model?

Read on for Eugene’s early findings and a roadmap for additional posts.

Comments closed