Press "Enter" to skip to content

Category: Learning

Understanding Binary Trees

Robert Maclean has a couple of posts on binary trees.  In the first post, he explains the basics of a binary tree:

As a binary tree has some flexibility in it, a number of classifications have come up to have a consistent way to discuss a binary tree. Common classifications are:

  • Full binary tree: Each node in a binary tree can have zero, one, or two child nodes. In a fullbinary tree, each node can only have zero or two child nodes.
  • Perfect binary tree: This is a full binary tree with the additional condition that all leaf nodes (i.e. nodes with no children) are at the same level/depth.
  • Complete binary tree: The complete binary tree is where each leaf node is as far left as possible.
  • Balanced binary tree: A balanced binary tree is a tree where the height of the tree is as small a number as possible.

Then, he looks at binary search trees:

So, why should we care about a BST? We should care because searching is really performant in it as each time you move a depth down, you eliminate approximately 50% of the potential nodes.

So, for example, if we wanted to find the item in our example with the key 66, we could start at the root (50) and move right. At that point, we have eliminated 8 possible nodes immediately. The next is to the left from the node with the 70 (total possible nodes removed 12). Next is to the right of the node with the value of 65, and then to 66 to the left of 67. So we found the node with 5 steps.

Going to Big O Notation, this means we achieved a performance of close to O(log n). It is possible to have a worst case of O(n), when the tree is not Optimal or Unbalanced.

Binary search trees are an easy to understand, reasonably efficient model for searching for data.  Even when there are better options, this is an easy algorithm to implement and can often be good enough to solve the problem.

Comments closed

Technical Debt, T-SQL Business Rules Edition

Paul Turley tells a story of technical debt:

They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports.  Every time a user wants a new report, a request is sent to the IT group.  A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules.  Complex reports might take days or weeks of development time.  I needed to update a dimension table in the data model and needed a calculated column to differentiate case types.  Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”.  The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements.  It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting.  After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it.  After all, someone had already solved this problem, right?

It’s the persistent battle between “don’t fix what isn’t broken” and “the process is broken, even if the code isn’t.”

Comments closed

Inside SQL Server 6.5

Brent Ozar reviews a blast from the past:

I picked up half a dozen used books about SQL Server 6.5, then spent a delightful weekend reading them. Seriously delightful – lemme tell you just how into it I was. Erika and I eat all weekend meals out at restaurants, but she saw me so happily curled up in my chair reading that she insisted on going out and getting tacos for us just so I wouldn’t have to get up. I was having that good of a time READING BOOKS ABOUT SQL SERVER 6.5. (Also, Erika is amazing. Moving on.)

To bring you that same fun, I wanna share with you a few pages from Inside SQL Server 6.5 by Ron Soukup, one of the fathers of SQL Server

It’s a great read.  My contribution to the Old But Good oeuvre is the Handbook of Relational Database Design by Candace Fleming and Barbar von Halle.  For my money, it has what I still consider the best primer on database normalization out there.  It also has a bunch of stuff that we should be glad we don’t do anymore, like figuring out specific file layouts for non-clustered indexes to minimize the number of disk rotations needed to retrieve a record of data.

Comments closed

Test Your Restore Process

Clive Strong tells a tale about a mental flub while restoring a backup:

Our automated restore process works really nicely. We take full backups on Saturday and differential backups through the week. We also take log backups through the day, but we were not going to be restoring those for this task. We have a number of internal platforms we restore to in full (or in part following a cut down process) so which gives us good validation of our backup files on a regular basis. We also have regular test restores from tape just for good measure.

However, a while ago I was asked to build a new server and restore the databases up to a specific date. We didn’t need a point in time restore, just to a specific day, so I pulled the full and differentials and wrote the script to do the restore for me. The script restored the full backup and the differential backup for Sunday, Monday, Tuesday, Wednesday and Thursday. I gave it the once over and executed the script. A while later and I came back and it was unexpectedly, still running. I eventually left the office and noted it finished in the early hours and ran for many hours longer than I had anticipated.

Read on for Clive’s more detailed explanation of the whoopsie moment.

Comments closed

The Joys Of Boolean Logic

Jen McCown explains boolean logic via truth tables:

t’s really fine if all the circuitry and algebra stuff makes no sense to you. We’re going to use a tried-and-true method to figuring out how these things will come out.

For a truth table, you just put in every combination of input types – meaning, inputs that will evaluate to true, and those that evaluate to false – and work out how the clause will evaluate it overall.

What do I mean? Well, we have two inputs for the questions above: StatusID and UserForeignID, which I’ll shorten to ID and ForeignIDto save characters. Logically speaking:

  • ID can either be equal to 1, or to a value other than 1.

  • ForeignID can either be equal to TD75R, or to a value other than TD75R.

A logic course is particularly helpful in these cases, but start by reading the whole thing.

Comments closed

Meidinger’s Law

Eugene Meidinger shares his thoughts on the future:

Since we are prognosticating, I want to take a guess at one of the constraints limiting the future.  I present you with Meidinger’s law:

An industry’s growth is constrained by how much your junior dev can learn in two years.

Let me explain. On my team, one of our developers’ just left for a different company. We also have a college student who will be going full time in May, upon graduation. How long do you think it’s going to take the new guy to get up to speed?

And how long do you think he’s going to stay?

This I think is a useful dictum which explains a pretty good amount of industry movement.

Comments closed

Think Logically When Debugging

Kenneth Fisher explains his debugging technique:

Almost everything is made up of smaller pieces. When you are running across a problem, (well, after you check the stupid things) start breaking up what you are doing into smaller pieces. I’m going to give a simplified version of an actual example of something I did recently. No code examples (sorry), just discussion, and only most of the tests so you can get the idea.

The problem
I’m running a job that runs a bat file that uses SQLCMD to run a stored procedure that references a linked server. I’m getting a double hop error.

There’s some good advice in here.  My slight modification to this is that if you have strong priors as to the probability distribution of the cause, hit the top one or two items (in terms of likelihood of being the cause of failure) first, and if that doesn’t fix it, then go from the beginning.  This assumes a reasonable probability distribution, which typically means that you have experienced the problem (or something much like it) before.

Comments closed

Order Of Operations With Logical Types

Thomas Rushton explains the order of operations, particularly around boolean operators:

The order in which calculations are done – not just reading from left to right, but remembering that things like multiplication and division happen before addition and subtraction. My son tells me that kids nowadays are being taught something called “BIDMAS” – which stands for “Brackets, Indices, Division, Multiplication, Addition, Subtraction”. Or it can be BODMAS – Brackets, Operations, Division… (Operation is a fancy new way of describing indices – ie xy)

Unsurprisingly, there are similar rules for Boolean operators.

It’s a valuable lesson oft learned.

Comments closed

GDPR In The UK

Ed Elliott covers that lesser-known Sex Pistols track in a multi-part series.

Part 1 covers some of the official documentation around how the ICO interprets GDPR:

To read the article, and the actual requirements I would start at page 32 which begins “HAVE ADOPTED THIS REGULATION:” this lists each of the articles (requirements). You can go through each of these and make sure you are compliant with them.

The exciting bit, the fines

The exciting headline-grabbing parts of GDPR are the fines that can be enforced. We don’t yet know how the ICO will apply the fines, words like maximum are used and the maximum possible fines are large. It is possible that the maximum fines will apply but we will look in part 2 at previous ICO enforcement actions to see if the ICO’s past performance gives us any clues as to its possible future decisions.

Part 2 looks at a couple of prior cases and how the ICO handled them:

Talk Talk started mitigating the issue by writing to all of its customers telling them how to deal with scam calls. Talk Talk told the ICO what happened and they responded with their own investigation and a £100,000 fine. The reasons were:

– The system failed to have adequate controls over who could access which records, i.e. anyone could access any record not just the cases they were working on
– The exports allowed all fields, not just the ones required for the regulatory reports
– Wipro were able to make wildcard searches
– The issue was a long-running thing from 2004 when Wipro were given access until 2014

One of the mitigating factors was that there was no evidence that this was even the source of the scam calls, plus there is no evidence anyone suffered any damage or distress as a result of this incident.

Part 3 looks at a couple more cases, too.  And Ed promises part 4.

Comments closed

Digging Into The Data Professional Survey

Melissa Connors looks at the 2018 Data Professionals Salary Survey:

This report is filtered to the United States, Private sector, full-time employees, Job Titles with more than 50 results, all primary databases, a salary between $15,000 and $200,000, and a survey year of 2018.

On the top are employees who said they work remotely 0 days per week, the middle is office employees who telecommute 1-4 days per week, and the bottom is the true remote employee who does this 5+ days per week.

The overall median salaries were $97,316 for office employees, $111,500 for part time telecommuters, and $114,163 for full time remote employees, which led to the click-bait title of this post. 🙂 It’s possible that this is because only more senior or highly-valued employees feel comfortable working from home, or are even allowed to, depending on the company culture.

Click through to see all of Melissa’s findings.

Comments closed