Here is an ongoing list of articles that I consider to be along these lines – either promoting best practices or eradicating bad habits; not all are explicitly framed as a “bad habit,” but they do all represent in some way things I wish I observed less often. Some of my opinions are controversial, and many have evoked very passionate comment threads – so I recommend scrolling down for those, too.
It’s a pretty long list.
If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.
Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.
Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.
Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.
This is a thought-provoking article that applies to all disciplines, not just data science.
It’s tempting to think, when p \ge \alphap≥α, that you’ve found the opposite thing from the p < \alphap<αcase: that you get to conclude that there is no statistically significant difference between the two averages. Don’t do that!
Simple statistical tests like the tt-test only tell you when averages are different; they can’t tell you when they’re the same. When they fail to find a difference, there are two possible explanations: either there is no difference or you haven’t collected enough data yet. So when a test fails, it could be your fault: if you had run a slightly larger experiment with a slightly larger NN, the test might have successfully found the difference. It’s always wrong to conclude that the difference does not exist.
It’s an interesting read. H/T Emmanuelle Rieuf.
A few months back, Microsoft started the Microsoft Professional Program for Data Science (note the program name change from Microsoft Professional Degree to Microsoft Professional Program, or MPP). This is online learning via edX.org as a way to learn the skills and get the hands-on experience that a data science role requires. You may audit any courses, including the associated hands-on labs, for free. However, to receive credit towards completing the data science track in the Microsoft Professional Program, you must obtain a verified certificate for a small fee for each of the ten courses you successfully complete in the curriculum. The course schedule is presented in a suggested order, to guide you as you build your skills, but this order is only a suggestion. If you prefer, you may take them in a different order. You may also take them simultaneously or one at a time, so long as each course is completed within its specified session dates.
Look for it sometime next year.
I enjoyed reading this article about devops at Etsy. One of the really key things about this article is – there is no devops organization at Etsy. It’s about how developers and operations people work productively together! Also, it was a slow incremental migration towards different practices. They did not wake up one day and become devops. I think this is the first talk that used the term ‘devops’?
It’s also not about “everyone is a software developer” – one of the authors of this book, Katherine Daniels, is a senior operations engineer at Etsy at Etsy. I don’t know any of the details of her job, but my impression is that she has a lot of expertise in operations. It’s not like “make operations so easy that nobody has to an expert at it”. Of course you need people who know a ton about operations! Probably those people write software as part of their job?
One of the scariest realizations that I’m slowly coming to (other than “Information Technology is people!”) is the sheer number of overlapping dependencies in the tech world. A bit earlier in my career, I felt like I could be “a SQL Server guy” and focus on that while not caring too much about the outside world. It seems like saying that you want to be “just an X” has become more difficult at the margin, and DevOps is just one example of this: keeping an edge means going broader about more things while still trying to dig deeper in relevant areas. That’s a tough balancing act.
If that’s not the academic version of a controversial headline, I don’t know what is…
I’ve been a C# developer since year 2000. I want to move to be a DBA. I’ve started getting involved at user groups and SQL Saturdays but nobody wants to hire me as a DBA.
I have been trying to move to other companies but my resume is strongly inclined to show my C#, front end experience. I know for a fact that I’m really good on SQL as I keep solving problems in every other project but no one seems to pay really attention to the DB. I have noticed that when applying for positions I get called for my C# experience but not when applying only to SQL jobs.
Should I find a Junior DBA position and take a pay cut?
That transition can be difficult, but I think Kendra’s answer is a good one.
On the opposite side, Daniel Janik looks at developers who shouldn’t go down that track:
I recently helped out with a .NET MVC project running on SQL Server 2016 where I found some pretty interesting stored procedures. I’ve seen a lot of really creative SQL but these were completely puzzling.
The database included many to many tables for customers who have addresses and phone numbers. A “mapping” table was created for the tables so they could map to a customer.
Normally you’d think a simple JOIN would suffice to get a list of addresses or phone numbers for a customer. These was done a way that I’ve never seen before.