Fun With Tibbles

Kevin Feasel

2018-01-15

R

Theo Roe provides an introduction to tibbles in R:

Tibbles are a modern take on data frames, but crucially they are still data frames. Well, what’s the difference then? There’s a quote I found somewhere on the internet that decribes the difference quite well;

“keeping what time has proven to be effective, and throwing out what is not”.

Basically, some clever people took the classic data.frame(), shook it til the ineffective parts fell out, then added some new, more appropriate features.

I probably don’t do enough with tibbles, but the upside is that in most cases, there’s a smooth transition.

The Importance Of A Data Computing Layer For Reporting

Buxing Jiang argues that there are reporting scenarios in which building a data computing layer is critical:

In previous articles, we mentioned that most reporting performance issues need to be addressed during the data preparation stage, but many scenarios can’t be handled within the data source. For example, parallel data retrieval should be performed outside of the data source because its purpose is to increase I/O performance. To achieve the controllable buffer, the buffer information needs to be written to an external storage device, which can’t be handled within a data source. The asynchronous data buffering and loading data by random page number in building a list report can’t be handled by a data source. Even for an associative query over multiple datasets that a data source can deal with, it would be necessary to get it done outside the data source when multiple databases or a non-database source is involved and when the database load needs to be reduced. Obviously, these scenarios that are not able to be handled within a data source also can’t be handled by a reporting tool.

I would be concerned about implementation details overwhelming the general value of a data computing layer.

Flexible Active Directory Account Lookup In Powershell

Jana Sattainathan builds a flexible AD lookup cmdlet in Powershell:

Now, the problem is that I have to lookup the AD User account for each of these users to add to the AD group “CrisisManagement_ReadOnly_Group”. Although I still use the AD module command Get-ADUser to lookup names, I have to do so one name at a time like this:

Get-ADUser -Filter ‘(name -like “*David*”) -and (name -like “*Smith*”)’

The above method is tedious and time consuming, especially if it is a long list of users. I would rather paste the list that the sender sent me as is into a PowerShell command and auto-magically add the corresponding accounts to the AD group. If we are unable to find an user, report it as an error.

Click through for the code and more.

Powershell Core On Server Core

Max Trinidad shows how to install Powershell Core on Windows Server Core:

In the following Virtual machine scenario, I got one Server Core with Active Directory (Build 16299) and Windows 10 (Build 16299) joined to my new domain. Both Build 16299.
On my Windows client I create a shared folder named “SharedFiles”, where I copy over the latest MSI version of PowerShell Core “PowerShell-6.0.0-win-x64.msi”.

Then, on the Server Core I’m going to create a map drive to my Windows client shared folder to then run the MSI installation from Windows PowerShell Console.

He also shows how to uninstall Powershell Core, should you wish.

Analyzing Data Professional Salary Data

Kevin Feasel

2018-01-15

Power BI, R

Ginger Grant has built a dashboard to analyze data professional salaries:

In the survey for 2018, the people who made the most money were from Hong Kong with an average salary of $263,289.  Before you start planning on moving, you will might want to look at the data a little closer.  There were 2 people who responded from Hong Kong.  One of them said he was making over 1.4 million dollars, the highest amount reported in the survey.  Given the fact that we only have two responses from Hong Kong, we will be unable to draw a definitive conclusion with 2 records. To be able to answer that question, more analysis will need to be done on the location and salary information and you will probably want to add market basket criteria because a dollar say in Hong Kong doesn’t go as far as the average apartment rental is $3,237 a month as it does say in Uganda where the rent is around $187 a month.

Click through to see the final product and grab a copy of her dashboard.

OLTP-Friendly Database Deployments

Michael Swart looks at one of the biggest problems when trying to do a zero-downtime deployment to an OLTP system:

There are two main kinds of SQL queries. SELECT/INSERT/UPDATE/DELETE statements are examples of Data Manipulation Language (DML). CREATE/ALTER/DROP statements are examples of Data Definition Language (DDL).

With schema changes – DDL – we have the added complexity of the SCH-M lock. It’s a kind of lock you don’t see with DML statements. DML statements take and hold schema stability locks (SCH-S) on the tables they need. This can cause interesting blocking chains between the two types where new queries can’t start until the schema change succeeds

Click through for suggestions with regard to schema locks, as well as a few tips for modifying large tables.

DBA Salary Calculations

Eugene Meidinger takes a whack at the data professional salary survey:

So I’m using something called a multiple linear regression to make a formula to predict your salary based on specific variables. Unfortunately, the highest Coefficient of Determination (or R2) I’ve been able to get is 0.37. Which means, as far as I understand it, that at most the model explains 37% of the variation.

Additionally the spread on the results isn’t great either. The standard deviation, a measure of spread, is about $25,000 on the original subset of data. Which means we’d expect 68% to be within +/- $25,000 of the average and 95% to be within +/- $50,000 of the average. So what happens when we apply our model?

Read on for Eugene’s early findings and a roadmap for additional posts.

Service Broker Security

Misha Capewell has a quick synopsis of the two flavors of service broker security:

There are 2 types of security in Service Broker: dialog and transport. Dialog security establishes a secure, authenticated connection between Service Broker Services or dialog endpoints. Transport security establishes an authenticated network connection between SQL Server instances or Service Broker endpoints. Clear as mud, right? Don’t worry, these are easily mixed up by both novice and experienced Service Broker admins. To illustrate, let’s go back to our taxes scenario. You’ve completed your forms, stamped your envelope and you’re ready to mail it in. You drop it in your nearest mailbox and what happens next? A postal worker will pick it up, it gets loaded into a truck and shipped between various sorting facilities (as you might have noticed I have no clue how the USPS works) until it is finally delivered to the IRS via yet another postal worker. Now, those postal workers all have the authority to transport your tax return from point to point. However, they do not have the authority to open up and read your return. That’s what transport security is. The IRS agent on the other end, though, he does have the authority to read your return. That’s dialog security.

Click through for more details, including a couple of scripts to help configure each form of security.

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031