Day: February 24, 2017

Improving Agent Alerts

Published 2017-02-24 by Kevin Feasel

Chris Bell has a way of making SQL Agent error messages a lot better:

Not very helpful. Sure, I know the job failed, and what step it failed on, but now I have to connect to the agent and look up the history to determine if this is something I have to worry about.

It would be nice to receive the details seen in the history of the job showing up in the email alert received.

Recently I have been working with systems that had all the alert on failure configured so we knew when things failed and could jump on re-running them if needed. We even had them showing up into a data team slack channel, so we had a history as well as notification to everyone on the team at the same time. The problem is that there were not any details in the alerts we received so we had to be able to connect and figure out what to do next or hope that our paid monitoring service would act on something after reading the details of the failure.

Chris has provided a script and gives some recommendations on job configuration which might reduce the number of alerts you get.

Comments closed

Handwriting Character Recognition

Published 2017-02-24 by Kevin Feasel

Tomaz Kastrun compares a few different libraries in terms of handwritten numeric character recognition:

Recently, I did a session at local user group in Ljubljana, Slovenija, where I introduced the new algorithms that are available with MicrosoftML package for Microsoft R Server 9.0.3.

For dataset, I have used two from (still currently) running sessions from Kaggle. In the last part, I did image detection and prediction of MNIST dataset and compared the performance and accuracy between.

MNIST Handwritten digit database is available here.

Tomaz has all of the code available as well.

Comments closed

Fuzzy Searches In SQL Server

Published 2017-02-24 by Kevin Feasel

Phil Factor wants fuzzy searches done inside the relational database:

Many times in the past, I’ve had arguments with members of the development teams who, when we are discussing fuzzy searches, draw themselves up to their full height, look dignified, and say that a relational database is no place to be doing fuzzy searches or spell-checking. It should, they say, be done within the application layer. This is nonsense, and we can prove it with a stopwatch.

We are dealing with data. Relational databases do this well, but it just has to be done right. This implies searching on well-indexed fields such as the primary key, and not being ashamed of having quite large working tables. It means dealing with the majority of cases as rapidly as possible. It implies learning from failures to find a match. It means, most of all, a re-think from a procedural strategy.

This is a very interesting article, as Phil’s tend to be. I enjoy these types of solutions where it requires almost an inversion of mindset: instead of writing code which understands the data you intended, writing simpler code which looks at intention-laden data.

Comments closed

Removing Time Series Auto-Correlation

Published 2017-02-24 by Kevin Feasel

Vincent Granville shows a simple technique for removing auto-correlation from time series data:

A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found, then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series.

Dealing with serial correlation is a big issue in econometrics; if you don’t deal with it in an Ordinary Least Squares regression, your regression will appear to have more explanatory power than it really does.

Comments closed

Checking Backups In A File

Published 2017-02-24 by Kevin Feasel

Kenneth Fisher shows how to see which backups you have in a particular .bak file:

Restoring a backup file is pretty easy right?

1

RESTORE DATABASE [Test] FROM DISK = 'C:\backups\backup.bak';

Ok, but what if more than one database backup is stored in that single backup file? Didn’t know you could do that?
Yep. You can.

Read on for a couple good points regarding those backup files.

Comments closed

The Cloud DBA

Published 2017-02-24 by Kevin Feasel

Kendra Little thinks about the evolution of the DBA role:

Lots of things have been reported to kill the DBA over the years

SQL Server 2005 was said to be “self-tuning”! Who needs a DBA when the instance tunes itself? (Apparently everyone.)

Outsourcing: All the DBA jobs are going to X location, then Y location, then Z location. Then back to X. DBA jobs have become more global, but “outsourcing” hasn’t gotten rid of DBA jobs in the United States. It has been part of the trend to make working remotely more normal and easy, which is generally good for DBAs.

DevOps! All the developers will manage everything. And somehow know to do so. I love Dev Ops, and I have seen it wipe out some QA departments, but I haven’t seen it wipe out DBAs. I think it’s fun to be a DBA working with a Dev Ops team.

Consider this in contrast to Dave Mason’s concern. My perspective is a lot closer to Kendra’s, but both posts make the good point that IT roles are ever-shifting.

Comments closed

Analyzing Flight Data With Sparklyr

Published 2017-02-24 by Kevin Feasel

Aki Ariga continues his sparklyr series with some analysis of US flight data:

In this post, we will show you a visualization and build a predictive model of US flights with sparklyr. Flight visualization code is based on this article.

This post assumes you already have the following tables:

Airlines data as airlines_bi_pq. It is assumed to be on S3, but you can put it into HDFS. See also the Ibis project.

Airports data converted into Parquet format as airports_new_pq. See also 2009 ASA Data Expo.

You should make these tables available through Apache Hive or Apache Impala (incubating) with Hue.

There’s some setup work to get this going, but getting a handle on sparklyr looks to be a good idea if you’re in the analytics space.

Comments closed

Azure Container Service Supports Kubernetes

Published 2017-02-24 by Kevin Feasel

Serdar Yegulalp reports that Azure Container Service now supports the Kubernetes container management system:

Microsoft emphasized “choice” when it originally introduced Azure Container Service. Although it launched without Kubernetes, Azure initially supported Mesosphere DC/OS and Docker Swarm because the majority of Microsoft’s customers used them and the company believed they would be well served by the support.

Since then, Kubernetes has emerged as a clear leader among container orchestration solutions. It is used as an underpinning for deep learning frameworks and the basis for an open source serverless/“lambda” app framework, as well as offered as a managed on-premise service by one company.

Kubernetes on Azure is strictly focused on running Kubernetes within Azure, not providing it as a service elsewhere. But the GA release includes additions meant to appeal to a broad audience of both Linux and Windows Server users, such as support for the latest version of DC/OS (1.8.8).

It’s an interesting world out there.

Comments closed

RTVS RC1

Published 2017-02-24 by Kevin Feasel

David Smith alerts us that R Tools for Visual Studio Release Candidate 1 is available:

We’ll cover the features in detail with the general availability release of RTVS 1.0, but in summary the new features include:

Remote Execution: type R code in your local RTVS instance, but have the computations performed on a remote R server. You can also switch between local and remote workspaces at will.
SQL Server Integration: work with database connections and SQL queries, and create stored procedures with embedded R code.
Enhanced R Graphics Support: multiple floating and dockable plot windows, each with plot history.

I’ve been using RTVS more frequently lately and it’s definitely growing on me.

Comments closed

Bulk Loading Into Azure SQL Database

Published 2017-02-24 by Kevin Feasel

Jovan Popovic shows that you can easily bulk load data into Azure SQL Database:

Azure SQL Database enables you to directly load files stored on Azure Blob Storage using the BULK INSERT T-SQL command and OPENROWSET function.

Loading content of files form Azure Blob Storage account into a table in SQL Database is now single command

Click through for the details.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28