Press "Enter" to skip to content

Category: Learning

How The DBA Role Is Changing

Tom Smith spoke to 22 executives from 21 companies about how the role of Database Administrator is changing:

  • While developers don’t think they need them, DBAs are still needed for governance to make it easier to analyze data.

  • DBAs have gone from managing databases tobeing data engineers across multiple systems. They focus on how data moves from one database to another, the consumption of data, tuning of the data, and management of the data process across the data landscape is critical until it is distributed and executed automatically.

  • DBAs have moved from being focused on individual products like SQLServer and Oracle to having to deal with bringing companies’ big data implementation to life.

There are a lot of points here.  I agree with many, disagree with a few, and think that some of them are quite context-sensitive.  But all are worth thinking about.

Comments closed

Customer Retention Analysis With SQL

Luba Belokon walks through some sample customer retention analysis queries written in SQL:

Customer retention curves are essential to any business looking to understand its clients and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business, which is to say, whether or not customers return — and at what rate — after the first visit.

The first step to building a customer retention curve is to identify those who visited your business during the reference period, what I will call p1. It is important that the length of the period chosen is a reasonable one, and reflects the expected frequency of visits.

Different types of businesses are going to expect their customers to return at different rates:

  • A coffee shop may choose to use an expected frequency of visits of once a week.

  • A supermarket may choose a longer period, perhaps two weeks or one month.

In this case, I think the motivation portion is better than the queries themselves, but the article definitely works as an inspiration for building out good measures of frequency of occurrence.

Comments closed

Blockchain For Business Notes

Allison Tharp has some notes on an edX course entitled Blockchain for Business.  This looks like it will be a multi-part series.  Part one:

distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions).  Distributed ledger technologies have three basic components:

  • A data model to capture the current state of the ledger
  • A language of transactions to track the changes in the ledger state
  • A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary.  Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol.  Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm.  With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee.  Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning.  Another way to look at this is that the nodes are randomly selected to validate blocks.  The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade.  It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Comments closed

Take The 2018 Data Professional Salary Survey

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

  • It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)

  • It’s open to all database platforms.

  • As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side.  That way there’s a broader distribution of entries.

Comments closed

Thoughts On Reliability

Stuart Moore wants to rename Site Reliability Engineering:

The word “Site” in the IT domain typically refers to either a physical location (data center site) or an application (web site); however, the heart of the definition is sociotechnical, not strictly technology. From an undated (seriously, Google?) interview with Ben Traynor, the founder of the SRE movement: “… we have a bunch of rules of engagement, and principles for how SRE teams interact with their environment — not only the production environment, but also the development teams, the testing teams, the users, and so on.” While the previous paragraph of that interview specifically focuses on the type of work that’s being done by Google’s SRE team, these rules of engagement show that SRE’s should be concerned with the entire value stream of service delivery including not only operations, but development, testing, and ultimately the end user experience.  In, other words. SRE’s are concerned with the reliability of the whole service, not just the technical parts.

And Brent Ozar reviews Database Reliability Engineering:

Jump to page 189, the Data Replication section of Chapter 10. Campbell & Majors explain the differences between:

  • Single-leader replication – like Microsoft SQL Server’s Always On Availability Groups, where only one server can accept writes for a given database
  • No-leader replication – like SQL Server’s peer-to-peer replication, where any node can accept writes
  • Multiple-leader replication – like a complex replication topology where only 2-3 nodes can accept writes, but the rest can accept reads

The single-leader replication discussion covers pages 190-202 and does a phenomenal job of explaining the pros & cons of a system like Availability Groups. Those 12 pages don’t teach you how to design, implement, or troubleshoot an AG. However, when you’ve finished those 12 pages, you’ll have a much better understanding of when you should recommend a solution like that, and what kinds of gotchas you should watch out for.

That’s what a Database Reliability Engineer does. They don’t just know how to work with one database – they also know when certain features should be used, when they shouldn’t, and from a big picture perspective, how they should build automation to avoid weaknesses.

I can also recommend the Database Reliability Engineering book.  I’ve not seen the finished product yet (it’s buried in my reading list) but I do like it as a challenge for DBAs and developers to step up their games.

Comments closed

Design Concepts: Cognitive Load

Meagan Longoria explains the concept of cognitive load and how it relates to building reports:

Our first design concept is cognitive load, which comes from cognitive psychology and instructional design. Cognitive Load Theory says that when we present our audience with information, we are asking them to use brain power to process it. That brain power (aka working memory) is limited, so we need to be intentional about the information we present to them.

In order to commit information to memory and be able to recall it later, that information must go through sensory memory and working memory and then be encoded into long-term memory.

This concept drives a lot of good advice in dashboard and report design, particularly that if it does not directly help a person learn the important information you are trying to convey, it’s not worth having on the report or dashboard.

Comments closed

Long Live The DBA

Kellyn Pot’vin-Gorman notes that the “Gone will be the DBA” trend has hit Oracle as well:

Any DBA who specializes in optimization knows that hardware offers around 15% overall opportunity for improvement.  My favorite quote from Cary Millsap, “You can’t hardware your way out of a software problem” is quite fitting, too.  A hardware upgrade can offer a quick increase in performance, only to find that the problem seemingly returns after a period of time.  As we’ve discussed in previous posts.  The natural life of a database is growth-  growth in data, growth in processing, growth in users.  This growth requires more resources and if the environment is not performing as optimally and efficiently as possible, more resources will always be required.

Someday I will write my “No, the DBA isn’t going anywhere” opus, but today is not that day.  Anyhow, this is a good post for anyone worried that automation will kill the DBA.

Comments closed

Getting Started With SQL Server For Free

Jeff Mlakar points out some free resources for getting started with SQL Server:

The basics of the “Learn SQL Server Starter Pack”:

  1. SQL Server 2016 DE
    1. You can get the Developer Edition (DE) for…wait for it…FREE!
    2. Out in the wild you’ll see mostly Standard Edition (SE) or Enterprise Edition (EE). The great thing about DE is that it is identical to EE (it has all the features) in every aspect except that it cannot be licensed on a production machine. It must only be used for TEST or DEV environments. For home lab purposes you can use it as your development environment and have access to all the features to learn on!
    3. Download it here – SQL Server Downloads
    4. While you are here get used to reading the release notes and what is new in the version. You don’t need to understand everything in here right away but get used to the jargon and how Microsoft describes their features.
  2. Windows Server 2016 Evaluation Edition
    1. Download Windows Server 2016 Evaluation Edition
    2. You can evaluate the software for 180 days then will need to activate it. Then you can try to register for another eval and try again
  3. Virtual Box
    1. Virtual Box is a free, simple, and reliable virtualization tool. You’ll be able to do a lot to get started and build up your virtualization knowledge with this.
    2. Download the latest version of Virtual Box
    3. You don’t need to know very much about hypervisors and such – Virtual Box is very easy to learn with good documentation.

Evaluation versions are good for learning because they force you to tear down and rebuild your environment!

Jeff then links to a number of free resources to help out with the learning experience.

Comments closed