Press "Enter" to skip to content

Month: February 2017

That 53rd Week

Jens Vestergaard notes that you can sometimes have a 53rd week in the year:

There are a lot of great examples out there on how to build your own custom Time Intelligence into Analysis Services (MD). Just have a look at this, this, this, this and this. All good sources for solid Time Intelligence in SSAS.
One thing they have in common though, is that they all make the assumption that there is and will always be 52 weeks in a year. The data set I am currently working with is built on ISO 8601 standard. In short, this means that there is an (re-) occurrence of a 53rd full week as opposed to only 52 in the Gregorian version which is defined by: 1 Gregorian calendar year = 52 weeks + 1 day (2 days in a leap year).

The 53rd occurs approximately every five to six years, though this is not always the case. The last couple of times  we saw 53 weeks in a year was in 1995, 2000, 2006, 2012 and 2015. Next time will be in 2020. This gives you enough time to either forget about the hacks and hard-coded fixes in place to mitigate the issue OR bring your code in a good state, ready for the next time.

Dates and currency are hard problems.

Comments closed

Gloom Indexes

David Smith points out an interesting use of R:

Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a “gloom index”, and created the following chart of gloominess for each of the band’s nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.)

Do click through for Charlie’s explanation, including where the numbers come from.

Comments closed

Natural Earth In SQL Server

Jeff Pries shows how to use the Natural Earth data set in SQL Server:

But what about when you need more flexibility in your geographic display?  Some examples of this may be wanting to display something that you can’t find a shape file for (maybe all the states and provinces in North America) or maybe you want to dynamically draw the geography based on some property of the dataset.  Geospatial data queries to the rescue!  Using SQL Server’s native geospatial support, a geospatial query can be created to return something as simple as a point or rectangle, or complex as the geography of an entire country and all of its rivers.

Getting all of the latitude and longitude coordinates to create a useful geospatial query could potentially be an enormous amount of work.  Fortunately, that work has already been done in a freely available resource, thanks to Natural Earth and Laurent Dupuis.  SQL Server 2012 or greater is recommended for this process.

Click through for a walkthrough as well as some introductory queries to get you started with using the data set.

Comments closed

Thoughts On Cost-Based Optimizers

Joe Chang has the makings of an academic paper on the shortcomings of the current SQL Server cost optimizer model:

It might seem that given the pace of hardware change, such old model cannot possibly valid, resulting horrible execution plans. Around 1995 or so, the high-performance HDD was 7200RPM with a sequential bandwidth of 4-5MB/s. The mean rotational latency for 7200RPM is 4ms. An average seek time of 8.5ms seems reasonable, though I have not kept any documentation from this period. This would correspond to 80 IOPS per disk at queue depth 1 per HDD. So, it seems curious that the SQL Server cost model is based on the random IOPS of 4 disks, but the sequential IO of 2 HDDs.

Performance HDDs progressed first to 10K RPM around 1996, and then to 15K around 2000, with corresponding rotational latencies of 3 and 2ms respectively. Average seek time was reduced over time to 3ms with developments in powerful rare-earth magnets. The 10K HDD could support 125 IOPS and 200 IOPS for the 15K HDD. But no further progress was made on HDD rotational speed. In same time period, hard disk sequential IO phenomenally quickly exceeding 100MB/s in the fourth generation 15K HDD around 2005.

In other words, the SQL Server cost model is based on a 1350/320 = 4.2 ratio. But 15K HDDs in the mid-2000’s were 100MB/s × 128 pages/MB = 12,800 pages/sec sequential to 200 IOPS random for a ratio of 64:1. It turns out that achieving nearly the sequential IO capability of HDDs from SQL Server required a special layout strategy, as outlined in the Fast Track Data Warehouse Reference Architecture papers, which few people followed. This was due to the fixed, inflexible IO pattern of SQL Server, which required the disk configuration to match that of SQL Server instead of being able to adjust the IO pattern to match that of the storage configuration.

It’s worth taking the time to read.  I like Glenn Berry’s idea in the comments of building relative CPU/IO/memory measures and applying them rather than using the same values that were good for twenty years ago.

Comments closed

Remote Docker Administration On Windows Server 2016

Andrew Pruski shows how to perform remote Docker administration:

Continuing on my series in working with Docker on Windows, I noticed that I always open up a remote powershell window when working with Docker on servers. Nothing wrong with this, if you want to know how to do that you can follow my instructions here.

However what if we want to connect to the Docker engine remotely? There’s got to be a way to do that right? Well it’s not quite so straightforward, but there is a way to do it involving a custom image downloaded from the Docker Hub (built by Stefan Scherer [g|t]) whichs creates TLS certs to allow remote connections.

Let’s go through the steps.

Read on for the steps.

Comments closed

Waffle Charts

Devin Knight continues his Power BI custom visuals series with the waffle chart:

In this module you will learn how to use the Waffle Chart Power BI Custom Visual.  The Waffle Chart visual is most useful for presenting a percentage of data. This chart is a great option to choose over other visuals like Pie Charts, which are not great at showing proportions of data.

Waffle charts are infographic-friendly visuals; they’re easy to read and as long as you don’t have too many categories, easy to compare.

Comments closed

Gaps And Islands: Call Center Edition

Kathy Kellenberger solves an issue with tracking contiguous calls within a call center:

To come up with a solution, I first thought about how to solve it with a traditional cursor. I won’t provide a cursor solution here, but it would involve looping through the rows in order and finding the first and last employee rows. When the last EmpID doesn’t match the next EmpID, then the previous row is the last call for the previous EmpID and the current row is the first call for the next EmpID.

NOTE: The solution to this problem assumes that there are at least two calls per shift.

In 2012, four T-SQL window functions became available that let you look at expressions from different rows: LAG, LEAD, FIRST_VALUE, and LAST_VALUE. For this puzzle, LAG and LEAD seemed to be the answer.  In this case, I used the optional default parameter to replace any NULLs with -1.

This is a particular solution for the Gaps and Islands problem; here’s Itzik Ben-Gan on the topic.

Comments closed

Missing Costs

Kenneth Fisher has a post about missing important costs:

Growing up my mother used this phrase quite a bit. Penny wise, pound stupid. (In case you didn’t know the pound is the British equivalent of the dollar.) Basically, it means paying attention to the small stuff at the expense of the big stuff. My favorite example of this was a few jobs ago. Without any need to go into details, our coffee area was stocked with both 8oz and 16oz styrofoam cups. One day, one of the accountants decided that there was no point to the extra cups and got rid of the more expensive 16oz cups. Not really a big deal, but our morale was already disastrously low, so it had more impact that it might have otherwise. The most interesting part, though, was a memo that a co-worker sent out. Unfortunately, I don’t have it anymore so I’m going to have to do my best to re-create it.

The lesson is to think through the ramifications of decisions, as there tend to be unintended consequences due to an incomplete understanding of costs.

Comments closed

Dynamic DNS And Powershell

Drew Furgiuele explains dynamic DNS and shows how to use Powershell to keep your Google Dynamic DNS record up to date:

Google’s Dynamic DNS API is really little less than a HTTPS POST to a specific URL with some parameters packed into the post. What makes it work, though, is for each dynamic DNS host name you provide, you get an auto-generated user name and password that needs to be part of the POST. PowerShell is really naturally suited for this type of automation, thanks to Invoke-WebRequest (MSDN Link). It’s built for this, and even supports HTTPS. Heck, it even supports the implementation of username:password in the request via a PSCredential object.

All I really needed to do was to wrap the code in a function and add some parameters that can be passed in, invoke the web request, and parse the results.

Click through for more information, including a couple Powershell scripts.

Comments closed

Market Basket Analysis Basics

Leila Etaati has an introduction to market basket analysis with R:

For instance, imagine we have below transaction items from a shopping store  for last hours,

Customer 1: Salt, pepper, Blue cheese

Customer 2: Blue Cheese, Pasta, Pepper, tomato sauce

Customer 3: Salt, Blue Cheese, Pepperoni, Bacon, egg

Customer 4: water, Pepper, Egg, Salt

we want to know how many times customer purchase pepper and salt together
the support will be : from out four main transactions (4 customers), 2 of them purchased salt and pepper together. so the support will be 2 divided by 4 (all number of transaction.

Basket analysis is one way of building a recommendation engine:  if you’re buying butter, cream, and eggs, do you also want to buy sugar?

Comments closed