Blockchain For Business Notes

Allison Tharp has some notes on an edX course entitled Blockchain for Business.  This looks like it will be a multi-part series.  Part one:

distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions).  Distributed ledger technologies have three basic components:

  • A data model to capture the current state of the ledger
  • A language of transactions to track the changes in the ledger state
  • A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary.  Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol.  Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm.  With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee.  Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning.  Another way to look at this is that the nodes are randomly selected to validate blocks.  The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade.  It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Take The 2018 Data Professional Salary Survey

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

  • It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)

  • It’s open to all database platforms.

  • As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side.  That way there’s a broader distribution of entries.

Thoughts On Reliability

Stuart Moore wants to rename Site Reliability Engineering:

The word “Site” in the IT domain typically refers to either a physical location (data center site) or an application (web site); however, the heart of the definition is sociotechnical, not strictly technology. From an undated (seriously, Google?) interview with Ben Traynor, the founder of the SRE movement: “… we have a bunch of rules of engagement, and principles for how SRE teams interact with their environment — not only the production environment, but also the development teams, the testing teams, the users, and so on.” While the previous paragraph of that interview specifically focuses on the type of work that’s being done by Google’s SRE team, these rules of engagement show that SRE’s should be concerned with the entire value stream of service delivery including not only operations, but development, testing, and ultimately the end user experience.  In, other words. SRE’s are concerned with the reliability of the whole service, not just the technical parts.

And Brent Ozar reviews Database Reliability Engineering:

Jump to page 189, the Data Replication section of Chapter 10. Campbell & Majors explain the differences between:

  • Single-leader replication – like Microsoft SQL Server’s Always On Availability Groups, where only one server can accept writes for a given database
  • No-leader replication – like SQL Server’s peer-to-peer replication, where any node can accept writes
  • Multiple-leader replication – like a complex replication topology where only 2-3 nodes can accept writes, but the rest can accept reads

The single-leader replication discussion covers pages 190-202 and does a phenomenal job of explaining the pros & cons of a system like Availability Groups. Those 12 pages don’t teach you how to design, implement, or troubleshoot an AG. However, when you’ve finished those 12 pages, you’ll have a much better understanding of when you should recommend a solution like that, and what kinds of gotchas you should watch out for.

That’s what a Database Reliability Engineer does. They don’t just know how to work with one database – they also know when certain features should be used, when they shouldn’t, and from a big picture perspective, how they should build automation to avoid weaknesses.

I can also recommend the Database Reliability Engineering book.  I’ve not seen the finished product yet (it’s buried in my reading list) but I do like it as a challenge for DBAs and developers to step up their games.

Design Concepts: Cognitive Load

Meagan Longoria explains the concept of cognitive load and how it relates to building reports:

Our first design concept is cognitive load, which comes from cognitive psychology and instructional design. Cognitive Load Theory says that when we present our audience with information, we are asking them to use brain power to process it. That brain power (aka working memory) is limited, so we need to be intentional about the information we present to them.

In order to commit information to memory and be able to recall it later, that information must go through sensory memory and working memory and then be encoded into long-term memory.

This concept drives a lot of good advice in dashboard and report design, particularly that if it does not directly help a person learn the important information you are trying to convey, it’s not worth having on the report or dashboard.

Defining Memory Sets

Kendra Little has concise definitions of a few terms that we see when working with memory:

Private Bytes for a process

For a process in Windows, this shows…

  • How much memory is allocated

  • This includes page file space allocated and standby list

Click through for the full set of definitions.

Long Live The DBA

Kellyn Pot’vin-Gorman notes that the “Gone will be the DBA” trend has hit Oracle as well:

Any DBA who specializes in optimization knows that hardware offers around 15% overall opportunity for improvement.  My favorite quote from Cary Millsap, “You can’t hardware your way out of a software problem” is quite fitting, too.  A hardware upgrade can offer a quick increase in performance, only to find that the problem seemingly returns after a period of time.  As we’ve discussed in previous posts.  The natural life of a database is growth-  growth in data, growth in processing, growth in users.  This growth requires more resources and if the environment is not performing as optimally and efficiently as possible, more resources will always be required.

Someday I will write my “No, the DBA isn’t going anywhere” opus, but today is not that day.  Anyhow, this is a good post for anyone worried that automation will kill the DBA.

Getting Started With SQL Server For Free

Jeff Mlakar points out some free resources for getting started with SQL Server:

The basics of the “Learn SQL Server Starter Pack”:

  1. SQL Server 2016 DE
    1. You can get the Developer Edition (DE) for…wait for it…FREE!
    2. Out in the wild you’ll see mostly Standard Edition (SE) or Enterprise Edition (EE). The great thing about DE is that it is identical to EE (it has all the features) in every aspect except that it cannot be licensed on a production machine. It must only be used for TEST or DEV environments. For home lab purposes you can use it as your development environment and have access to all the features to learn on!
    3. Download it here – SQL Server Downloads
    4. While you are here get used to reading the release notes and what is new in the version. You don’t need to understand everything in here right away but get used to the jargon and how Microsoft describes their features.
  2. Windows Server 2016 Evaluation Edition
    1. Download Windows Server 2016 Evaluation Edition
    2. You can evaluate the software for 180 days then will need to activate it. Then you can try to register for another eval and try again
  3. Virtual Box
    1. Virtual Box is a free, simple, and reliable virtualization tool. You’ll be able to do a lot to get started and build up your virtualization knowledge with this.
    2. Download the latest version of Virtual Box
    3. You don’t need to know very much about hypervisors and such – Virtual Box is very easy to learn with good documentation.

Evaluation versions are good for learning because they force you to tear down and rebuild your environment!

Jeff then links to a number of free resources to help out with the learning experience.

Clippy Lives: In Scala

Akhil Vijayan explains Scala Clippy:

Now you may be wondering how these errors are identified and we get advice related to it.

Simple, these are provided by the Scala community. If you visit their official website Scala Clippy where you can find a tab “Contribute”. Under that, we can post our own errors. These errors are parsed first, and when successful we can add our advice which will be reviewed and if accepted it will be added to their database which will, in turn, be beneficial to others.

Take a close look at the screenshots; I missed it at first, but there’s helpful advice above the error message.

A Game Written In SQL Server

Daniel Janik has built a game that runs entirely in SQL Server:

As it turns out SQL Server Management Studio (SSMS) can display many types of emoji and this is the key.

Keeping in mind that the whole thing was not written with performance or best practice in mind, I’d like to introduce the world to the very first action adventure game 100% written and played in SQL Server!

The goal here is to have a game which helps teach some basics of development practices.  Interesting concept.

Don’t Fear The Tidyverse

Kevin Feasel


Learning, R

David Robinson explains why he prefers to explain the tidyverse version of R first rather than base R:

I’d summarize the two “competing” curricula as follows:

  • Base R first: teach syntax such as $ and [[]], loops and conditionals, data types (numeric, character, data frame, matrix), and built-in functions like ave and tapply. Possibly follow up by introducing dplyr or data.table as alternatives.
  • Tidyverse first: Start from scratch with the dplyr package for manipulating a data frame, and introduce others like ggplot2, tidyr and purrr shortly afterwards. Introduce the %>% operator from magrittr immediately, but skip syntax like [[]] and $ or leave them for late in the course. Keep a single-minded focus on data frames.

I’ve come to strongly prefer the “tidyverse first” educational approach. This isn’t a trivial decision, and this post is my attempt to summarize my opinions and arguments for this position. Overall, they mirror my opinions about ggplot2: packages like dplyr and tidyr are not “advanced”; they’re suitable as a first introduction to R.

I think this is the better position of the two, particularly for people who already have some experience with languages like SQL.


December 2017
« Nov