Press "Enter" to skip to content

Category: Learning

Keeping Up With Analytics

Jen Underwood discusses the need to stay relevant in analytics and shares some tips on how to do so:

Although most analytics applications today still leverage older data warehouse and OLAP technologies on-premises, the pace of the cloud shift is significantly increasing. Infrastructure is getting better and is almost invisible in mature markets. Cloud fears are subsiding as more organizations witness the triumphs of early adopters. Instant, easy cloud solutions continue to win the hearts and minds of non-technical users. Cloud also accelerates time to market allowing for innovation at faster speeds than ever before. As data and analytics professionals, be sure to make time to learn a variety of cloud and hybrid analytics tools.

Exploring novel technologies across various ecosystems in the cloud world is usually as simple as spinning up a cloud image or service to get started. There are literally zillions of free and low cost resources for learning. As you dive into a new world of data, you will find common analytics architectures, design patterns, and types of technologies (hybrid connectivity, storage, compute, microservices, IoT, streaming, orchestration, database, big data, visualization, artificial intelligence, etc.) being used to solve problems.

It’s worth reading the whole thing.

Comments closed

More Advice For Data Scientists

Charles Parker provides more Dijkstra-style wisdom for budding data scientists:

Raise your standards as high as you can live with, avoid wasting your time on routine problems, and always try to work as closely as possible at the boundary of your abilities. Do this because it is the only way of discovering how that boundary should be moved forward.

Readers of this blog post are just as likely as anyone to fall victim to the classic maxim, “When all you have is a hammer, everything is a nail.” I remember a job interview where my interrogator appeared disinterested in talking further after I wasn’t able to solve a certain optimization using Lagrange multipliers. The mindset isn’t uncommon: “I have my toolbox.  It’s worked in the past, so everything else must be irrelevant.”

There’s some good advice in here.

Comments closed

Advice For A Budding Data Scientist

Charles Parker riffs off of an Edsger Dijkstra note:

It’s still early days for machine learning. The bounds and guidelines about what is possible or likely are still unknown in a lot of places, and bigger projects that test more of those limitations are more likely to fail. As a fledgling data engineer, especially in the industry, it’s almost certainly the more prudent course to go for the “low-hanging fruit” — easy-to-find optimizations that have real world impact for your organization. This is the way to build trust among skeptical colleagues and also the way to figure out where those boundaries are, both for the field and for yourself.

As a personal example, I was once on a project where we worked with failure data from large machines with many components. The obvious and difficult problem was to use regression analysis to predict the time to failure for a given part. I had some success with this, but nothing that ever made it to production. However, a simple clustering analysis that grouped machines by the frequency of replacement for all parts had some lasting impact; this enabled the organization to “red flag” machines that fell into “high replacement” group where the users may have been misusing the machines and bring these users in for training.

There’s some good advice.  Also read the linked Dijkstra note; even in bullet point form, he was a brilliant guy.

Comments closed

Data Science Resources

Steph Locke has some resources if you are interested in getting started with data science:

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data is written by Hadley Wickham and Garett Grolemund. You can buy it and you can also access it online.

If you’re interested in learning to actually start doing data science as a practitioner, this book is a very accessible introduction to programming.

Starting gently, this book doesn’t teach you much about the use of R from a general programming perspective. It takes a very task oriented approach and teaches you R as you go along.

This book doesn’t cover the breadth and depth of data science in R, but it gives you a strong foundation in the coding skills you need and gives you a sense of the of the process you’ll go through.

It’s a good starting set of links.

Comments closed

Statistics For Programmers

Julia Evans shares some good resources for developers interested in statistics:

There are a lot of good links in Julia’s post.  I should also mention that Andrew Gelman and Deborah Nolan have a new book coming out in July.  Gelman’s Bayesian approach suits me well, so I’m pre-ordering the book.

Comments closed

Learn SQL Server Security Via E-mails

Chris Bell has announced a free e-mail course for learning the basics of SQL Server security:

Today I am very excited to announce that I have (finally!) launched my email course covering the basics of SQL Server Security.

This has been a lot of work to get a new system in place to make the learning experience a little different. It is like a normal email course, but at the same time it isn’t.

I have been waiting for this for months ever since hearing Chris first talk about it.

Comments closed

What Is The Data Platform?

Rolf Tesmer has weighed in with his thoughts on the “Data Platform”:

What this has meant is that innovation – in particular in the Azure Public Cloud, ISV’s, new data services/products, and new data related infrastructure – has accelerated dramatically and changed the very definitions of what was previously accepted as comprising the “Data Platform”.

Nowadays when I talk to customers about the “Data Platform” it encompasses a range of services across a mix of IaaS, PaaS and SaaS.  The decision of which data service to deploy now comes down to matching the business case technical requirements with the capability of a purpose built cloud service – as opposed to (in the past) trying to fit an obvious NoSQL use case into a traditional RDBMS platform.

I now see the “New Data Platform” as much broader than ever before and includes many other “non-traditional” data services…

Cf. Eugene Meidinger (who started this) and me (who exacerbated this).  This is an area ripe for consideration.

1 Comment

Powershell Difficulties

Dave Mason shares some difficulties he has had grokking Powershell:

The developer in me thinks this is nuts. Run the same few lines of code twice, with no changes in between, and get different outputs? Madness!

Here’s another example. Nothing too complex here: I connect to an instance of SQL, SELECT CURRENT_TIMESTAMP, and show the returned value in the output window. (There’s a fixable issue here that I would go on to discover later. But hold that thought for now.)

Even when you’re conceptually familiar with a language, getting into the particular foibles of that language can expose all sorts of behavior which is strange to newcomers.

Comments closed

Managing The Pace Of Change

Kellan Danielson and the rest of the Power Pivot Pro team discuss the pace of change in the data platform:

@djharshany I’ve found Pocket (https://getpocket.com/) really useful for saving items for later. I’m on a schedule as well – I save a lot of articles and then pour through them when I’m on an airplane or waiting in line somewhere. #productivityhack

I think this furious pace of technological development has made me much more aware 1) of the amount of noise out in the world that I’m safe ignoring and 2) of how we need to stay vigilant in producing content that cuts through the noise.

Given that these are people who specialize in the fastest-moving part of the Microsoft data platform, it’s worth getting their thoughts on the rapid pace of change.

Comments closed

Learning Azure

Grant Fritchey notes that web searches won’t always take you to the latest version of documentation:

If you’re learning Azure and you research things using a search engine, then I strongly recommend you use the ability to limit your searches to the last year. Otherwise, you may be getting incomplete or incorrect data. At this precise moment, I’d say you need to limit your searches to Google (although I honestly hate recommending one of these tools over the other, let’s keep the competition fierce) because I was able to easily get the correct information within a couple of mouse clicks.

Grant’s post makes sense, and so does the search engine behavior:  in Grant’s case, those older cmdlet documentation links have been around longer and older resources tend to have a larger number of relevant linkbacks and clicks.  That’s also visible in SQL Server documentation, where sometimes you’ll land on the 2008R2 or 2012 version of documentation rather than 2016 or vNext.

Meanwhile, Victoria Holt has a bunch of resources for the Azure curious:

Here are a whole set of links to kick start your learning of Microsoft Azure services.

Introduction video

Changes to computer thinking – Stephen Fry explains cloud computing

That’s a good set of starting links.

Comments closed