Month: October 2016

Stretch Database Authentication Failures

Published 2016-10-19 by Kevin Feasel

Jack Li walks through a bug in Stretch database:

The message provided enough directions. It says either you have a bad login or firewall setting on the Azure DB Server side is not configured correctly. The very first thing is to ensure the Firewall was configured correctly. We even tried 0.0.0.0. to 255.255.255.255. But it didn’t resolve the issue.

Next we created a brand new database on the same server and tried on that one. It worked. But customer just couldn’t get the old database to work even she made sure that she could use the login/password to log in using SSM on the same server to the Azure DB server.

On the same server, brand new database worked but the old database didn’t. So that made me wonder what happens if I manually cause an failure and later retry.

Read on for the repo and solution.

Comments closed

Parallel PoshRSJob Template

Published 2016-10-19 by Kevin Feasel

Cody Konior walks through using PoshRSJob with a custom function:

Recently I migrated from my own runspace module to Boe Prox’s PoshRSJob which is pretty much perfect. But today I wanted to share how to integrate PoshRSJob cleanly into your functions through a default -Parallel parameter and using a template.

You can very easily modify this for your own purposes however it’s even more awesome as-is if you run parallelised tests for one major input (like a computer name) but where additional information might also be passed in through object properties on a pipeline (I’ll explain why you’d want to do that later in the post). Here’s what it looks like:

Read on for code and explanation. Powershell parallelism is something that I’ve never been good at, so hopefully this makes it easier for me…

Comments closed

Linear Models

Published 2016-10-19 by Kevin Feasel

Andrea Spano, et al, are starting a new book:

This chapter is an introduction to the first section of the book, Linear Models, and contain some theoretical explanation and lots of examples. At the end of the chapter you will find two summary tables with Linear model formulae and functions in R and Common R functions for inference.

The book is just getting started, but you can get it from the Quantide website. In the meantime, there are two other books on learning R and developing in R. These books are licensed Creative Commons, so they’re free to read and share.

Comments closed

Debugging Biml

Published 2016-10-19 by Kevin Feasel

Bill Fellows shows how to write out your intermediate Biml for debugging purposes:

Using tooling is always a trade-off between time/frustration and monetary cost. BIDS Helper/BimlExpress are free so you’re prioritizing cost over all others. And that’s ok, there’s no judgement here. I know what it’s like to be in places where you can’t buy the tools you really need. One of the hard parts about debugging the expanded Biml from BimlScript is you can’t see the intermediate or flat Biml. You’ve got your Metadata, Biml and BimlScript and a lot of imagination to think through how the code is being generated and where it might be going wrong. That’s tough. Even at this point where I’ve been working with it for four years, I can still spend hours trying to track down just where the heck things went wrong. SPOILER ALERT It’s the metadata, it’s always the metadata (except when it’s not). I end up with NULLs where I don’t expect it or some goofball put the wrong values in a field. But how can you get to a place where you can see the result? That’s what this post is about.

It’s a trivial bit of code but it’s important. You need to add a single Biml file to your project and whenever you want to see the expanded Biml, prior to it being translated into SSIS packages, right click on the file and you’ll get all that Biml dumped to a file. This recipe calls for N steps.

This is a good tip and has helped me a few times in the past.

Comments closed

Compression On Temporal Tables

Published 2016-10-19 by Kevin Feasel

Daniel Janik notes that system-generated temporal tables automatically use page-level compression:

At first I was a bit puzzled. I noticed that the system generated table was consistently smaller than my user created table. It was not only smaller it was twice as small!

I did some further testing on my Surface this weekend and here’s what I found:

— Side note: I use Person.Address a lot in demos, so I decided to create a new table to test with in hopes of not breaking any other demos I do regularly.

I think this is a good decision for a default, but if you are unable to support page-level compression for some reason, there’s a workaround: create your history table beforehand.

Comments closed

Who Is Active Update

Published 2016-10-19 by Kevin Feasel

Adam Machanic has an update to sp_whoisactive:

Four and a half years have flown by since I released sp_whoisactive version 11.11.

It’s been a pretty solid and stable release, but a few bug reports and requests have trickled in. I’ve been thinking about sp_whoisactive v.Next — a version that will take advantage of some newer SQL Server DMVs and maybe programmability features, but in the meantime I decided to clear out the backlog on the current version.

Given that I have three keyboard shortcuts dedicated to sp_whoisactive, you know I’m excited. Adam also has a new domain for the product.

Comments closed

Azure Data Lake Updates

Published 2016-10-18 by Kevin Feasel

Michael Rys has the October updates for Azure Data Lake:

We seem to be just cranking out new stuff :). Here are the October 2016 Updates for Azure Data Lake U-SQL!

The main take away is that the October refresh has now removed the old deprecated syntax of the items we have announced over the last couple of release notes!

Thanks for those who volunteered to test the new version of more scalable file set. Please contact us if you want to try it and help us validate it.

Click through for the release notes.

Comments closed

Machine Learning Algorithms In R

Published 2016-10-18 by Kevin Feasel

Ginger Grant has a list of machine learning algorithms and their implementations in R:

Often times determining which algorithm to use can take a while. Here is a pretty good flowchart for determining which algorithm should be used given some examples of what the desired outcomes and data contain. The diagram lists the algorithms, which are implemented in Azure ML. The same algorithms can be implemented in R. In R there are libraries to help with nearly every task. Here’s a list of libraries and their accompanying links which can be used in Machine Learning. This list is no means comprehensive as there are libraries and functions other than the ones listed here, but if you are trying to write a Machine Learning Experiment in R, and are looking at the flowchart, these R functions and Libraries will provide the tools to do the types of Machine Learning Analysis listed.

I think algorithm determination is one of the most difficult parts of machine learning. Even if you don’t mean to go there, the garden of forking paths is dangerous.

Comments closed

Custom Sorts

Published 2016-10-18 by Kevin Feasel

Rob Farley looks at ways of sorting data more efficiently:

Another option, which is more longwinded (some might suggest that would suit me – and if you thought that: Oi! Don’t be so rude!), and uses more reads, is to consider what we’d do in real life if we needed to do this.

If I had a pile of 73,595 orders, sorted by Salesperson order, and I needed to return them with a particular Salesperson first, I wouldn’t disregard the order they were in and simply sort them all, I’d start by diving in and finding the ones for Salesperson 7 – keeping them in the order they were in. Then I’d find the ones that weren’t the ones that weren’t Salesperson 7 – putting them next, and again keeping them in the order they were already in.

My first inclination is to think that this is a fragile solution—what about parameterization? Will that affect the execution plan in unexpected ways? I like the approach, however, and will have to add it to the toolbox for those cases in which it makes sense.

Comments closed

Kafka Consumer

Published 2016-10-18 by Kevin Feasel

I build a consumer and aggregator of Kafka data:

From here, I hook into the OnMessage event just like before, and like before we decode the Kafka payload and turn it into a string. Unlike before, however, I call Newtonsoft’s DeserializeObject method and return a Flight type, which I’ve defined above. This is the same definition as in the Producer, so in a production-quality environment, I’d pull that out to a single location rather than duplicating it.

Going back to the main function, I call the consumer.Start() method and let ‘er rip. When I’m ready to aggregate, I’ll hit the enter key and that’ll call consumer.Stop(). When that happens, I’m going to have up to 7 million records in a list called flights. Out of all of this information, I only need two attributes: the destination state and the arrival delay in minutes. I get those by using the map function on my sequence of flights, taking advantage of F#’s match syntax to get all relevant scenarios safely and put the result into a tuple. The resulting sequence of tuples is called flightTuple. I pass that into the delaysByState function.

By the time I give this presentation, I’m going to change the way I aggregate just a little bit to cut down on the gigs of RAM necessary to do this operation. But hey, at least it works…

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31