Press "Enter" to skip to content

Category: Learning

Data Science Resources

Steph Locke has some resources if you are interested in getting started with data science:

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data is written by Hadley Wickham and Garett Grolemund. You can buy it and you can also access it online.

If you’re interested in learning to actually start doing data science as a practitioner, this book is a very accessible introduction to programming.

Starting gently, this book doesn’t teach you much about the use of R from a general programming perspective. It takes a very task oriented approach and teaches you R as you go along.

This book doesn’t cover the breadth and depth of data science in R, but it gives you a strong foundation in the coding skills you need and gives you a sense of the of the process you’ll go through.

It’s a good starting set of links.

Comments closed

Statistics For Programmers

Julia Evans shares some good resources for developers interested in statistics:

There are a lot of good links in Julia’s post.  I should also mention that Andrew Gelman and Deborah Nolan have a new book coming out in July.  Gelman’s Bayesian approach suits me well, so I’m pre-ordering the book.

Comments closed

Learn SQL Server Security Via E-mails

Chris Bell has announced a free e-mail course for learning the basics of SQL Server security:

Today I am very excited to announce that I have (finally!) launched my email course covering the basics of SQL Server Security.

This has been a lot of work to get a new system in place to make the learning experience a little different. It is like a normal email course, but at the same time it isn’t.

I have been waiting for this for months ever since hearing Chris first talk about it.

Comments closed

What Is The Data Platform?

Rolf Tesmer has weighed in with his thoughts on the “Data Platform”:

What this has meant is that innovation – in particular in the Azure Public Cloud, ISV’s, new data services/products, and new data related infrastructure – has accelerated dramatically and changed the very definitions of what was previously accepted as comprising the “Data Platform”.

Nowadays when I talk to customers about the “Data Platform” it encompasses a range of services across a mix of IaaS, PaaS and SaaS.  The decision of which data service to deploy now comes down to matching the business case technical requirements with the capability of a purpose built cloud service – as opposed to (in the past) trying to fit an obvious NoSQL use case into a traditional RDBMS platform.

I now see the “New Data Platform” as much broader than ever before and includes many other “non-traditional” data services…

Cf. Eugene Meidinger (who started this) and me (who exacerbated this).  This is an area ripe for consideration.

1 Comment

Powershell Difficulties

Dave Mason shares some difficulties he has had grokking Powershell:

The developer in me thinks this is nuts. Run the same few lines of code twice, with no changes in between, and get different outputs? Madness!

Here’s another example. Nothing too complex here: I connect to an instance of SQL, SELECT CURRENT_TIMESTAMP, and show the returned value in the output window. (There’s a fixable issue here that I would go on to discover later. But hold that thought for now.)

Even when you’re conceptually familiar with a language, getting into the particular foibles of that language can expose all sorts of behavior which is strange to newcomers.

Comments closed

Managing The Pace Of Change

Kellan Danielson and the rest of the Power Pivot Pro team discuss the pace of change in the data platform:

@djharshany I’ve found Pocket (https://getpocket.com/) really useful for saving items for later. I’m on a schedule as well – I save a lot of articles and then pour through them when I’m on an airplane or waiting in line somewhere. #productivityhack

I think this furious pace of technological development has made me much more aware 1) of the amount of noise out in the world that I’m safe ignoring and 2) of how we need to stay vigilant in producing content that cuts through the noise.

Given that these are people who specialize in the fastest-moving part of the Microsoft data platform, it’s worth getting their thoughts on the rapid pace of change.

Comments closed

Learning Azure

Grant Fritchey notes that web searches won’t always take you to the latest version of documentation:

If you’re learning Azure and you research things using a search engine, then I strongly recommend you use the ability to limit your searches to the last year. Otherwise, you may be getting incomplete or incorrect data. At this precise moment, I’d say you need to limit your searches to Google (although I honestly hate recommending one of these tools over the other, let’s keep the competition fierce) because I was able to easily get the correct information within a couple of mouse clicks.

Grant’s post makes sense, and so does the search engine behavior:  in Grant’s case, those older cmdlet documentation links have been around longer and older resources tend to have a larger number of relevant linkbacks and clicks.  That’s also visible in SQL Server documentation, where sometimes you’ll land on the 2008R2 or 2012 version of documentation rather than 2016 or vNext.

Meanwhile, Victoria Holt has a bunch of resources for the Azure curious:

Here are a whole set of links to kick start your learning of Microsoft Azure services.

Introduction video

Changes to computer thinking – Stephen Fry explains cloud computing

That’s a good set of starting links.

Comments closed

What Will The DBAs Do?

Kevin Hill predicts that database administration isn’t going anywhere anytime soon:

There have been a lot of questions, posts, answers, guesses and such floating around the SQL blogs lately…most of which seem to suggest that the DBA is going away.

Hogwash.

The DBA position is not going away.  Ever.  Or at least not before I retire to Utah to spend my days mountain biking 😉

That said, Kevin does point out that you shouldn’t rest on your laurels.

One fun anecdote I have about database administration:  I recall some marketing for some NoSQL product about how, by adopting their software, you can get rid of those stodgy database administrators.  Within a couple of years, said product’s parent company was offering developer training on “advanced” techniques, which included taking backups, tuning queries, implementing disaster recovery, and creating good indexes to help with performance.  But hey, at least they don’t have DBAs!

Comments closed

Bad Habits: A Full Listing

Aaron Bertrand has provided an index to his bad habits series:

Here is an ongoing list of articles that I consider to be along these lines – either promoting best practices or eradicating bad habits; not all are explicitly framed as a “bad habit,” but they do all represent in some way things I wish I observed less often. Some of my opinions are controversial, and many have evoked very passionate comment threads – so I recommend scrolling down for those, too.

It’s a pretty long list.

Comments closed

Learning Versus Remembering

Via R-Bloggers, a discussion on learning versus remembering with respect to data science:

If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.

Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.

Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.

Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.

This is a thought-provoking article that applies to all disciplines, not just data science.

Comments closed