I still have vivid memories of that night. I’d ordered pizza so that I could stay back at my hotel room and finish my punch list of things before go-live the next day. It was after 2am, and I was sitting at the kitchen counter of the Residence Inn in Kalamazoo, MI, the pizza box still open next to me as I worked my way through a large pepperoni.
I got to the item on my punch list for “delete all test appointments.” The logic here was pretty simple: All the test appointments were for the same imaginary test patient. Just find all of that person’s appointments, and delete them. I decided I would do this one doctor at a time to make sure I didn’t mess it up too badly.
It’s a harrowing story with a happy ending.
How much do database administrators, analysts, architects, developers, and data scientists make? We asked, and 882 of you from 46 countries answered this year. Y’all make a total of $84,114,940 USD per year! Hot diggety. (And at first glance, it looks like on average, y’all got raises this year.)
Download the 2019, 2018, & 2017 results in Excel.
Read on for some notes about the data and start playing around.
You’ll know an expert programmer by the quality of the code that they write. Experts have good communication skills, both sharing their own knowledge and soliciting input from others. They are self-aware, understanding the kinds of mistakes they can make, and reflective. They are also fast (but not at the expense of quality).
Experience should be measured not just on its quantity (i.e., number of years in the role), but on its quality. For example, working on a variety of different code bases, shipping significant amounts of code to production, and working on shared code bases. The knowledge of an expert is T-shaped with depth in the programming language and domain at hand, and a broad knowledge of algorithms, data structures, and programming paradigms.
Click through for the full review.
Hortonworks and Cloudera announcement about their merger is certainly an interesting for the Big Data landscape. These two are thought to be the leaders in the Hadoop industry.
Undeniably, a lot of people have seen what these two Big Data giants have delivered over the years within the Hadoop ecosystem.
With this merger they are aiming to use their combined expertise to deliver an enterprise data cloud. We’ve already seen what Hadoop based cloud offerings like HDInsight are capable of, so the potential here is huge.
Certainly, there’s potential for this to have massive implications in the Big Data industry. And this merger could also encourage even more Data Platform offerings to emerge.
Read on for Kevin’s thoughts on five major stories this year.
I’m naturally an introvert. If you and I have a conversation, it’s like a little taxi meter starts running. I may deeply, deeply enjoy the conversation and find it incredibly exciting, but it still taxes my energy levels. Small talk even more so. Imagine that every time someone chatted about the weather, you had to pay the same price as a Lyft ride to go 4 blocks. That’s how I feel about small talk.
That being said, we are still social creatures, and even introverts need human interaction. Especially so when you need to think through new situations, new problems. One of the things I realized attending PASS Summit is that I need social interaction to thrive. So now I spend a lot more time on Twitter and am part of a peer group of authors. I work down at the library whenever I have the chance.
When I did the work-from-home full-time thing, I sought out user groups to build up some technical skills and, more importantly, to get out of the house and talk to a group of people a couple times a week. That paid off really well in the long run.
Speaking of paying off in the long run, check out Eugene’s BI newsletter.
It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:
The anonymous survey closes Sunday, January 6, 2019. The results will be completely open source, and shared with the community for your analysis.
I like this survey so much that I delivered a talk at PASS Summit making heavy use of it.
I know this sounds somewhat silly. But, when thinking through the steps that I take to solve a business problem, I realized that I do employ a strategy. The backbone of that strategy is based on the principals of solving a word problem. Yes, that’s right. Does anyone else remember staring at those first complex word problems as a kid and not quite knowing where to start? I do! However, when my teacher provided us with strategies to break down the problem into less intimidating, actionable steps, everything became rather doable. The steps: circle the question, highlight the important information and cross out unnecessary information. Do these steps and all of a sudden the problem is simplified and much less scary. What a relief! By employing the same basic strategy, we too can feel that sense of calm when working on a business problem.
It sounds blase but paying attention to what people are saying (or writing) versus hearing a few words and assuming the rest.
Why why-provenance doesn’t work
Relational databases have why-provenance, which sounds on the surface exactly like what we’re looking for.
Given a relational database, a query issued against the database, and a tuple in the output of the query, why-provenance explains why the output tuple was produced. That is, why -provenance produces the input tuples that, if passed through the relational operators of the query, would produce the output tuple in question.
One reason that won’t work in our distributed systems setting is that the state of the system is not relational, and the operations can be much more complex and arbitrary than the well-defined set of relational operators why-provenance works with.
Read the whole thing.
So the first issue was that the software was built in-house by another company in the same industry. Imagine, for example, if a large bakery had created an ERP system and another large bakery wanted to move to that system. Sounds great, right? Well, you run into two issues in that scenario.
First, a bakery is not an independent software vendor. Programming, by definition, is not their core competency. Which means that you may run into fragility or issues that you wouldn’t run into with a commercial piece of software. It also means that there isn’t going to be any documentation on migrating to the software or implementing it. Why would there be. If you built software for one company, why would you create scaffolding to move other companies onto it?
Second, not every business is the same. A lot of the fundamentals are the same, but you will run into many edge cases. We do invoices this way. They do workorders this way. We handle purchase orders this way. They handle inventory that way.
The way that I think about it is like a sea shell. It’s this intricate curve that’s grown over time, organically, to fit that creature. If you just try to fit a different snail or mollusk in that shell, it may not work out.
Read the whole thing.
SANs have become a bit like the printer industry — You don’t pay a lot for the enclosure, the device itself, i.e. the SAN box & software; but you pay through the nose for ‘refills’, i.e. the drives that your SAN vendor gods deem worthy of their enclosure.
It’s frighteningly accurate. Ask your storage admin what it costs to add a single drive (or pair of drives, if you’re using something with built-in redundancy) to your SAN. Then compare that cost with the same exact drive off the retail market. It’s highway robbery. And we’re letting them get away with it because we can’t evolve fast enough to take advantage of storage virtualization tech (S2D, SOFS, RDMA) that effectively makes servers with locally attached SSDs a superior architecture. (As long as they’re not using a horribly outdated interface like SAS!)
Nate also includes several more interesting lessons. SQL Saturdays are great for picking up useful knowledge.