Press "Enter" to skip to content

Category: Learning

Think Logically When Debugging

Kenneth Fisher explains his debugging technique:

Almost everything is made up of smaller pieces. When you are running across a problem, (well, after you check the stupid things) start breaking up what you are doing into smaller pieces. I’m going to give a simplified version of an actual example of something I did recently. No code examples (sorry), just discussion, and only most of the tests so you can get the idea.

The problem
I’m running a job that runs a bat file that uses SQLCMD to run a stored procedure that references a linked server. I’m getting a double hop error.

There’s some good advice in here.  My slight modification to this is that if you have strong priors as to the probability distribution of the cause, hit the top one or two items (in terms of likelihood of being the cause of failure) first, and if that doesn’t fix it, then go from the beginning.  This assumes a reasonable probability distribution, which typically means that you have experienced the problem (or something much like it) before.

Comments closed

Order Of Operations With Logical Types

Thomas Rushton explains the order of operations, particularly around boolean operators:

The order in which calculations are done – not just reading from left to right, but remembering that things like multiplication and division happen before addition and subtraction. My son tells me that kids nowadays are being taught something called “BIDMAS” – which stands for “Brackets, Indices, Division, Multiplication, Addition, Subtraction”. Or it can be BODMAS – Brackets, Operations, Division… (Operation is a fancy new way of describing indices – ie xy)

Unsurprisingly, there are similar rules for Boolean operators.

It’s a valuable lesson oft learned.

Comments closed

GDPR In The UK

Ed Elliott covers that lesser-known Sex Pistols track in a multi-part series.

Part 1 covers some of the official documentation around how the ICO interprets GDPR:

To read the article, and the actual requirements I would start at page 32 which begins “HAVE ADOPTED THIS REGULATION:” this lists each of the articles (requirements). You can go through each of these and make sure you are compliant with them.

The exciting bit, the fines

The exciting headline-grabbing parts of GDPR are the fines that can be enforced. We don’t yet know how the ICO will apply the fines, words like maximum are used and the maximum possible fines are large. It is possible that the maximum fines will apply but we will look in part 2 at previous ICO enforcement actions to see if the ICO’s past performance gives us any clues as to its possible future decisions.

Part 2 looks at a couple of prior cases and how the ICO handled them:

Talk Talk started mitigating the issue by writing to all of its customers telling them how to deal with scam calls. Talk Talk told the ICO what happened and they responded with their own investigation and a £100,000 fine. The reasons were:

– The system failed to have adequate controls over who could access which records, i.e. anyone could access any record not just the cases they were working on
– The exports allowed all fields, not just the ones required for the regulatory reports
– Wipro were able to make wildcard searches
– The issue was a long-running thing from 2004 when Wipro were given access until 2014

One of the mitigating factors was that there was no evidence that this was even the source of the scam calls, plus there is no evidence anyone suffered any damage or distress as a result of this incident.

Part 3 looks at a couple more cases, too.  And Ed promises part 4.

Comments closed

Digging Into The Data Professional Survey

Melissa Connors looks at the 2018 Data Professionals Salary Survey:

This report is filtered to the United States, Private sector, full-time employees, Job Titles with more than 50 results, all primary databases, a salary between $15,000 and $200,000, and a survey year of 2018.

On the top are employees who said they work remotely 0 days per week, the middle is office employees who telecommute 1-4 days per week, and the bottom is the true remote employee who does this 5+ days per week.

The overall median salaries were $97,316 for office employees, $111,500 for part time telecommuters, and $114,163 for full time remote employees, which led to the click-bait title of this post. 🙂 It’s possible that this is because only more senior or highly-valued employees feel comfortable working from home, or are even allowed to, depending on the company culture.

Click through to see all of Melissa’s findings.

Comments closed

2018 Data Professional Survey Results

Brent Ozar has posted data for the 2018 Data Professionals Survey:

A few things to know about it:

  • The data is public domain. The license tab makes it clear that you can use this data for any purpose, and you don’t have to credit or mention anyone.

  • The spreadsheet includes both 2017 & 2018 results. For the new questions this year, the 2017 answers are populated with Not Asked.

  • The postal code field was totally optional, and may be wildly unreliable. Folks asked to be able to put in small portions of their zip code, like the leading numbers.

Looks like I’m going to add one more thing to the to-do list for this week…

Comments closed

How The DBA Role Is Changing

Tom Smith spoke to 22 executives from 21 companies about how the role of Database Administrator is changing:

  • While developers don’t think they need them, DBAs are still needed for governance to make it easier to analyze data.

  • DBAs have gone from managing databases tobeing data engineers across multiple systems. They focus on how data moves from one database to another, the consumption of data, tuning of the data, and management of the data process across the data landscape is critical until it is distributed and executed automatically.

  • DBAs have moved from being focused on individual products like SQLServer and Oracle to having to deal with bringing companies’ big data implementation to life.

There are a lot of points here.  I agree with many, disagree with a few, and think that some of them are quite context-sensitive.  But all are worth thinking about.

Comments closed

Customer Retention Analysis With SQL

Luba Belokon walks through some sample customer retention analysis queries written in SQL:

Customer retention curves are essential to any business looking to understand its clients and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business, which is to say, whether or not customers return — and at what rate — after the first visit.

The first step to building a customer retention curve is to identify those who visited your business during the reference period, what I will call p1. It is important that the length of the period chosen is a reasonable one, and reflects the expected frequency of visits.

Different types of businesses are going to expect their customers to return at different rates:

  • A coffee shop may choose to use an expected frequency of visits of once a week.

  • A supermarket may choose a longer period, perhaps two weeks or one month.

In this case, I think the motivation portion is better than the queries themselves, but the article definitely works as an inspiration for building out good measures of frequency of occurrence.

Comments closed

Blockchain For Business Notes

Allison Tharp has some notes on an edX course entitled Blockchain for Business.  This looks like it will be a multi-part series.  Part one:

distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions).  Distributed ledger technologies have three basic components:

  • A data model to capture the current state of the ledger
  • A language of transactions to track the changes in the ledger state
  • A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary.  Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol.  Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm.  With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee.  Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning.  Another way to look at this is that the nodes are randomly selected to validate blocks.  The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade.  It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Comments closed

Take The 2018 Data Professional Salary Survey

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

  • It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)

  • It’s open to all database platforms.

  • As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side.  That way there’s a broader distribution of entries.

Comments closed