Category: Learning

Digging Into The Data Professional Survey

Published 2018-01-11 by Kevin Feasel

Melissa Connors looks at the 2018 Data Professionals Salary Survey:

This report is filtered to the United States, Private sector, full-time employees, Job Titles with more than 50 results, all primary databases, a salary between $15,000 and $200,000, and a survey year of 2018.

On the top are employees who said they work remotely 0 days per week, the middle is office employees who telecommute 1-4 days per week, and the bottom is the true remote employee who does this 5+ days per week.

The overall median salaries were $97,316 for office employees, $111,500 for part time telecommuters, and $114,163 for full time remote employees, which led to the click-bait title of this post. 🙂 It’s possible that this is because only more senior or highly-valued employees feel comfortable working from home, or are even allowed to, depending on the company culture.

Click through to see all of Melissa’s findings.

Comments closed

2018 Data Professional Survey Results

Published 2018-01-09 by Kevin Feasel

Brent Ozar has posted data for the 2018 Data Professionals Survey:

A few things to know about it:

The data is public domain. The license tab makes it clear that you can use this data for any purpose, and you don’t have to credit or mention anyone.
The spreadsheet includes both 2017 & 2018 results. For the new questions this year, the 2017 answers are populated with Not Asked.
The postal code field was totally optional, and may be wildly unreliable. Folks asked to be able to put in small portions of their zip code, like the leading numbers.

Looks like I’m going to add one more thing to the to-do list for this week…

Comments closed

How The DBA Role Is Changing

Published 2018-01-02 by Kevin Feasel

Tom Smith spoke to 22 executives from 21 companies about how the role of Database Administrator is changing:

While developers don’t think they need them, DBAs are still needed for governance to make it easier to analyze data.
DBAs have gone from managing databases tobeing data engineers across multiple systems. They focus on how data moves from one database to another, the consumption of data, tuning of the data, and management of the data process across the data landscape is critical until it is distributed and executed automatically.
DBAs have moved from being focused on individual products like SQLServer and Oracle to having to deal with bringing companies’ big data implementation to life.

There are a lot of points here. I agree with many, disagree with a few, and think that some of them are quite context-sensitive. But all are worth thinking about.

Comments closed

Customer Retention Analysis With SQL

Published 2017-12-22 by Kevin Feasel

Luba Belokon walks through some sample customer retention analysis queries written in SQL:

Customer retention curves are essential to any business looking to understand its clients and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business, which is to say, whether or not customers return — and at what rate — after the first visit.

The first step to building a customer retention curve is to identify those who visited your business during the reference period, what I will call p1. It is important that the length of the period chosen is a reasonable one, and reflects the expected frequency of visits.

Different types of businesses are going to expect their customers to return at different rates:

A coffee shop may choose to use an expected frequency of visits of once a week.
A supermarket may choose a longer period, perhaps two weeks or one month.

In this case, I think the motivation portion is better than the queries themselves, but the article definitely works as an inspiration for building out good measures of frequency of occurrence.

Comments closed

T-SQL Tuesday Roundup

Published 2017-12-18 by Kevin Feasel

Mala Mahadevan has the roundup for this month’s T-SQL Tuesday:

Thank you to all of you for taking time to contribute. I like the suggestion made by Glenda Gable on being partners to help with accountability on our goals. If anyone feels up to this just leave a comment below, we can set up a slack channel or something similar to help each other.

There were quite a few entries this month.

Comments closed

Blockchain For Business Notes

Published 2017-12-13 by Kevin Feasel

Allison Tharp has some notes on an edX course entitled Blockchain for Business. This looks like it will be a multi-part series. Part one:

A distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions). Distributed ledger technologies have three basic components:

A data model to capture the current state of the ledger

A language of transactions to track the changes in the ledger state

A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary. Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol. Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm. With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee. Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning. Another way to look at this is that the nodes are randomly selected to validate blocks. The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade. It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Comments closed

Take The 2018 Data Professional Salary Survey

Published 2017-12-13 by Kevin Feasel

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)
It’s open to all database platforms.
As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side. That way there’s a broader distribution of entries.

Comments closed

Thoughts On Reliability

Published 2017-12-06 by Kevin Feasel

Stuart Moore wants to rename Site Reliability Engineering:

The word “Site” in the IT domain typically refers to either a physical location (data center site) or an application (web site); however, the heart of the definition is sociotechnical, not strictly technology. From an undated (seriously, Google?) interview with Ben Traynor, the founder of the SRE movement: “… we have a bunch of rules of engagement, and principles for how SRE teams interact with their environment — not only the production environment, but also the development teams, the testing teams, the users, and so on.” While the previous paragraph of that interview specifically focuses on the type of work that’s being done by Google’s SRE team, these rules of engagement show that SRE’s should be concerned with the entire value stream of service delivery including not only operations, but development, testing, and ultimately the end user experience. In, other words. SRE’s are concerned with the reliability of the whole service, not just the technical parts.

And Brent Ozar reviews Database Reliability Engineering:

Jump to page 189, the Data Replication section of Chapter 10. Campbell & Majors explain the differences between:

Single-leader replication – like Microsoft SQL Server’s Always On Availability Groups, where only one server can accept writes for a given database

No-leader replication – like SQL Server’s peer-to-peer replication, where any node can accept writes

Multiple-leader replication – like a complex replication topology where only 2-3 nodes can accept writes, but the rest can accept reads

The single-leader replication discussion covers pages 190-202 and does a phenomenal job of explaining the pros & cons of a system like Availability Groups. Those 12 pages don’t teach you how to design, implement, or troubleshoot an AG. However, when you’ve finished those 12 pages, you’ll have a much better understanding of when you should recommend a solution like that, and what kinds of gotchas you should watch out for.

That’s what a Database Reliability Engineer does. They don’t just know how to work with one database – they also know when certain features should be used, when they shouldn’t, and from a big picture perspective, how they should build automation to avoid weaknesses.

I can also recommend the Database Reliability Engineering book. I’ve not seen the finished product yet (it’s buried in my reading list) but I do like it as a challenge for DBAs and developers to step up their games.

Comments closed

Design Concepts: Cognitive Load

Published 2017-11-28 by Kevin Feasel

Meagan Longoria explains the concept of cognitive load and how it relates to building reports:

Our first design concept is cognitive load, which comes from cognitive psychology and instructional design. Cognitive Load Theory says that when we present our audience with information, we are asking them to use brain power to process it. That brain power (aka working memory) is limited, so we need to be intentional about the information we present to them.

In order to commit information to memory and be able to recall it later, that information must go through sensory memory and working memory and then be encoded into long-term memory.

This concept drives a lot of good advice in dashboard and report design, particularly that if it does not directly help a person learn the important information you are trying to convey, it’s not worth having on the report or dashboard.

Comments closed

Defining Memory Sets

Published 2017-10-23 by Kevin Feasel

Kendra Little has concise definitions of a few terms that we see when working with memory:

Private Bytes for a process

For a process in Windows, this shows…

How much memory is allocated
This includes page file space allocated and standby list

Click through for the full set of definitions.

Comments closed