I have three blog posts on installing and using R in SQL Server.
First, installing SQL Server R Services:
I’m excited that CTP 3 of SQL Server 2016 is publicly available, in no small part because it is our first look at SQL Server R Services. In this post, I’m going to walk through installing Don’t-Call-It-SSRS on a machine.
Getting a Linux machine to talk to a SQL Server instance is harder than it should be. Yes, Microsoft has a Linux ODBC driver and some easy setup instructions…if you’re using Red Hat or SuSE. Hopefully this helps you get connected.
If you’re using RStudio on Windows, it’s a lot easier: create a DSN using your ODBC Data Sources.
Finally, using SQL Server R Services:
So, what’s the major use of SQL Server R Services? Early on, I see batch processing as the main driver here. The whole point of getting involved with Revolution R is to create sever-quality R, so imagine a SQL Agent job which runs this procedure once a night against some raw data set. The R job could build a model, process that data, and return a result set. You take that result set and feed it into a table for reporting purposes. I’d like to see more uses, but this is probably the first one we’ll see in the wild.
It’s a preview of a V1 product. Keep that in mind.
The first and third posts are for CTP 3, so beware the time-sensitive material warnings.
Mickey Stuewe hosted T-SQL Tuesday this month. Her topic: data modeling gone wrong. A few choice posts on the topic follow.
One of the problems I’ve seen with careless use of surrogate keys are the duplication of natural keys. Quite often it’s overlooked that the natural key still needs to have a unique constraint. Without it, the reporting team ends up having to use MAX or DISTINCT to get the latest instance of the natural key, or SSIS packages are needed to clean up the duplicates. This can be compounded with many-to-many tables.
Surrogate keys are not replacements for natural keys; they are physical implementation mechanisms to make your life easier.
Rob Farley wants you to think about design and whether your warehouse is built in a way that helps the business:
Many data professionals look at a data warehouse as a platform for reporting, built according to the available data sources. I disagree with this.
The models within a data warehouse should describe the business. If it doesn’t, it’s a data model gone wrong.
What is the central thing that your business does? What is the main interest point? What do you need to look after? For me, this forms the core of the warehouse.
Thomas Rushton says name your stuff right. Picking the right name can be difficult. “Field1” probably isn’t the right name, though.
The tools for securely backing up computers, Web sites, data, and even entire hard drives have never been more affordable and ubiquitous. So there is zero excuse for not developing and sticking with a good backup strategy, whether you’re a home user or a Web site administrator.
PC World last year published a decent guide for Windows users who wish to take advantage of the the OS’s built-in backup capabilities. I’ve personally used Acronis and Macrium products, and find both do a good job making it easy to back up your rig. The main thing is to get into a habit of doing regular backups.
There are good guides all over the Internet showing users how to securely back up Linux systems (here’s one). Others tutorials are more OS-specific. For example, here’s a sensible backup approach for Debian servers. I’d like to hear from readers about their backup strategies — what works — particularly from those who maintain Linux-based Web servers like Apache and Nginx.
This article doesn’t directly relate to SQL Server, but it does act as a nice reminder: go make sure you have good backups. Of everything.