even more links
a paper someone said was good (by Efron): Bootstrap Methods: another look at the jackknife
openintro has free some statistics books
There are a lot of good links in Julia’s post. I should also mention that Andrew Gelman and Deborah Nolan have a new book coming out in July. Gelman’s Bayesian approach suits me well, so I’m pre-ordering the book.
Today I am very excited to announce that I have (finally!) launched my email course covering the basics of SQL Server Security.
This has been a lot of work to get a new system in place to make the learning experience a little different. It is like a normal email course, but at the same time it isn’t.
I have been waiting for this for months ever since hearing Chris first talk about it.
What this has meant is that innovation – in particular in the Azure Public Cloud, ISV’s, new data services/products, and new data related infrastructure – has accelerated dramatically and changed the very definitions of what was previously accepted as comprising the “Data Platform”.
Nowadays when I talk to customers about the “Data Platform” it encompasses a range of services across a mix of IaaS, PaaS and SaaS. The decision of which data service to deploy now comes down to matching the business case technical requirements with the capability of a purpose built cloud service – as opposed to (in the past) trying to fit an obvious NoSQL use case into a traditional RDBMS platform.
I now see the “New Data Platform” as much broader than ever before and includes many other “non-traditional” data services…
The developer in me thinks this is nuts. Run the same few lines of code twice, with no changes in between, and get different outputs? Madness!
Here’s another example. Nothing too complex here: I connect to an instance of SQL, SELECT CURRENT_TIMESTAMP, and show the returned value in the output window. (There’s a fixable issue here that I would go on to discover later. But hold that thought for now.)
Even when you’re conceptually familiar with a language, getting into the particular foibles of that language can expose all sorts of behavior which is strange to newcomers.
@djharshany I’ve found Pocket (https://getpocket.com/) really useful for saving items for later. I’m on a schedule as well – I save a lot of articles and then pour through them when I’m on an airplane or waiting in line somewhere. #productivityhack
I think this furious pace of technological development has made me much more aware 1) of the amount of noise out in the world that I’m safe ignoring and 2) of how we need to stay vigilant in producing content that cuts through the noise.
Given that these are people who specialize in the fastest-moving part of the Microsoft data platform, it’s worth getting their thoughts on the rapid pace of change.
If you’re learning Azure and you research things using a search engine, then I strongly recommend you use the ability to limit your searches to the last year. Otherwise, you may be getting incomplete or incorrect data. At this precise moment, I’d say you need to limit your searches to Google (although I honestly hate recommending one of these tools over the other, let’s keep the competition fierce) because I was able to easily get the correct information within a couple of mouse clicks.
Grant’s post makes sense, and so does the search engine behavior: in Grant’s case, those older cmdlet documentation links have been around longer and older resources tend to have a larger number of relevant linkbacks and clicks. That’s also visible in SQL Server documentation, where sometimes you’ll land on the 2008R2 or 2012 version of documentation rather than 2016 or vNext.
Here are a whole set of links to kick start your learning of Microsoft Azure services.
Changes to computer thinking – Stephen Fry explains cloud computing
That’s a good set of starting links.
There have been a lot of questions, posts, answers, guesses and such floating around the SQL blogs lately…most of which seem to suggest that the DBA is going away.
The DBA position is not going away. Ever. Or at least not before I retire to Utah to spend my days mountain biking 😉
That said, Kevin does point out that you shouldn’t rest on your laurels.
One fun anecdote I have about database administration: I recall some marketing for some NoSQL product about how, by adopting their software, you can get rid of those stodgy database administrators. Within a couple of years, said product’s parent company was offering developer training on “advanced” techniques, which included taking backups, tuning queries, implementing disaster recovery, and creating good indexes to help with performance. But hey, at least they don’t have DBAs!
Here is an ongoing list of articles that I consider to be along these lines – either promoting best practices or eradicating bad habits; not all are explicitly framed as a “bad habit,” but they do all represent in some way things I wish I observed less often. Some of my opinions are controversial, and many have evoked very passionate comment threads – so I recommend scrolling down for those, too.
It’s a pretty long list.
If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.
Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.
Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.
Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.
This is a thought-provoking article that applies to all disciplines, not just data science.
It’s tempting to think, when p \ge \alphap≥α, that you’ve found the opposite thing from the p < \alphap<αcase: that you get to conclude that there is no statistically significant difference between the two averages. Don’t do that!
Simple statistical tests like the tt-test only tell you when averages are different; they can’t tell you when they’re the same. When they fail to find a difference, there are two possible explanations: either there is no difference or you haven’t collected enough data yet. So when a test fails, it could be your fault: if you had run a slightly larger experiment with a slightly larger NN, the test might have successfully found the difference. It’s always wrong to conclude that the difference does not exist.
It’s an interesting read. H/T Emmanuelle Rieuf.