Press "Enter" to skip to content

Month: May 2016

Custom Power BI Visuals

Rob Farley looks into custom Power BI visuals:

I hadn’t explored much in the way of custom visuals in Power BI until a while back, even though I was very much aware of the competition that was held in September. It had been on my list to explore some of what was possible. And this month, the T-SQL Tuesday topic (hosted by Wendy Pastrick@wendy_dance) was to learn something new and to blog about it. So it seemed a good idea to learn how to make my own custom visualisation!

Now, creativity isn’t exactly my thing. I find it really hard to write songs, for example. I know how to do it – but I quickly become self-critical and get stuck. Writing is easier, because it feels less ‘creative’, and appeals more to the teacher / preacher in me (and I know that takes creativity, especially if you’ve ever seen me present, but it’s different). So sitting down and coming up with a new way of visualising data wasn’t something I was going to do.

For more info, check out more info on creating visuals and using R to visualize data in Power BI.

Comments closed

DB File Space Usage

Andy Mallon looks into sys.dm_db_file_space_usage:

SQL Server 2016 is coming out next month, and it’s chock full of new features that I can learn. Row-level security is interesting. Dynamic data masking sounds fun.

If you’re a regular reader of my blog, you probably know I try to approach questions from a unique angle. Instead of blogging about something cutting edge or sexy, I decided to scroll through the list of system views until I found one I didn’t recognize.

sys.dm_db_file_space_usage

The name is pretty self-explanatory, but I never noticed this existed until now. Seems like the type of DMV that I should have known about, but I didn’t. Quick look at BOL, and I got the verbose description from Microsoft:

Andy goes on to compare the outputs from this DMV to methods he’s historically used.

Comments closed

Trending And Smoothing

Matt Allington looks at trending and smoothing data in Power BI:

You can download the workbook here if you want to take a look, or simply look at the embedded version I have pinned at the bottom of the post.

Notice the spikes in sales in different months in the chart above?  These spikes are very common in sales data, and in my experience they can be even more prevalent in weekly sales data.  These spikes make it difficult to analyse trends in the data.  You could put a trend line into the chart (thanks to the April update do Power BI), but a standard linear trend line is too simplistic to really see what is happening in your data, particularly if there are seasonal changes.

One good way to look at the trends in your data is to add an Average Monthly Sales Rolling Quarter trend line to the chart.  You simply take the total sales of the last 3 months and then divide by 3.  If you were doing a weekly trend, take the last 13 weeks and divide by 13.  When you overlay this Avg Monthly Sales RQ line on the original chart, it looks like this.

This is a fairly advanced topic, but it’s also the kind of thing which separates good reporting from great reporting.

Comments closed

SQL Server 2016 Editions For BI

John White discusses which version of SQL Server 2016 you should use for business intelligence work:

SQL Server 2016 removes the BI Edition as an option, leaving us with a choice between only Standard and Enterprise. The biggest news in my opinion from a licensing perspective with 2016 is that Tabular Mode will now be supported in Standard Edition. This puts the tabular model within the reach of all organizations, and closes the licensing gap in the BI stack. This is fantastic news.

There are of course limitations with Standard mode. Tabular in Standard Mode is restricted to 16 GB of RAM, which may seem like a lot, but keep in mind that tabular is an in-memory technology. It’s possible to bump into this limit fairly quickly, but it’s a limit that serves the small/medium business space rather well.

It’s surprising (in a good way) that John recommends Standard Edition, at least for small and medium businesses.

Comments closed

Bike Rental Demand Estimation

The Revolution Analytics blog has a Microsoft-driven article on estimating bike rental demand with Microsoft R Server:

In addition to the original features in the raw data, we add number of bikes rented in each of the previous 12 hours as features to provide better predictive power. We create acomputeLagFeatures() helper function to compute the 12 lag features and use it as the transformation function in rxDataStep().

Note that rxDataStep() processes data chunk by chunk and lag feature computation requires data from previous rows. In computLagFeatures(), we use the internal function .rxSet() to save the last n rows of a chunk to a variable lagData. When processing the next chunk, we use another internal function .rxGet() to retrieve the values stored in lagData and compute the lag features.

This is a great article for anybody wanting to dig into analytics, because they show their work.

Comments closed

Distributed Unit Testing

Cloudera shows off their distributed unit testing framework:

This distributed testing infrastructure started out as a Cloudera hackathon project in 2014. Todd Lipcon and I worked on a shared backend for running test tasks on a cluster, with Todd focusing on onboarding the Apache Kudu (incubating) tests, and myself on Apache Hadoop. Our prototype implementation reduced the runtime of the 1,700+ Hadoop unit tests from 8.5 hours to 15 minutes.

Since then, we’ve spent time improving the infrastructure and on-boarding additional projects. Besides Kudu and Hadoop, our distributed testing infrastructure is also being used by our Apache Hive and Apache HBase teams. We can now run all the Hadoop unit tests in less than 10 minutes!

Finally, we’re happy to announce that both our infrastructure and code are public! You can browse the webUI at http://dist-test.cloudera.org and see all the source code (ASLv2 licensed) at the cloudera/dist_test github repository. This infrastructure is already being used at upstream Apache to run the Kudu pre-commit tests.

This is an interesting look at how to scale out unit tests.  It’s a bit of a long read (especially with all the videos) but worth your time.

Comments closed

The Code Behind Power BI Parameters

Chris Webb shows us how to get to the M code used in query parameters:

From this you can see that the value returned by the parameter query is just a single piece of text – it’s the value “Monday” that is set as the Current Value, that’s to say the value returned by the parameter itself. The interesting stuff is all in the metadata record associated with the value. I blogged about metadata here, so you may want to read that post before going any further; it’s pretty clear that the fields in the metadata record correspond to the values set in the UI. All of the fields in the metadata record can be edited in the Advanced Editor if you want.

When the parameter is used in another query it is referenced like any other query value. For example, if you load the DimDate table from the Adventure Works DW sample database and use the parameter above to filter the EnglishDayNameOfWeek column then the code generated in the UI looks like this:

I’m sure that by next month, there will be a half-dozen new things added to this alone, given how fast the Power BI team can push features…

Comments closed

Deny Everything

Kenneth Fisher goes over grant, revoke, and deny for permissions:

This means that MyUser can not run a SELECT statement against any table, view or table valued function in the database.

That probably doesn’t sound like you are applying a permission does it? And that is probably where a lot of the confusion comes in. If, however, we take a look at the system views where the data resides then we can see proof that both commands, GRANT and DENY, add a permission.

Particularly interesting is exactly how the deny permission works—and that “deny” is in fact a “permission” in that you modify a permissions list.

Comments closed

Key Components For A Successful Project

Ginger Grant lists five key components for a successful data analysis project:

Security is an obvious consideration which needs to be addressed up front. Data is a very valuable commodity and only people with appropriate access should be allowed to see it. What steps are going to be employed to ensure that happens? How much administration is going to be required to implement it? These questions need to be answered up front.

I want to extend special thanks to Ginger for putting security as the top item on the list.  Also, this seems like a pretty good set of criteria for most projects, so definitely check it out.

Comments closed

Notes From A Biml User Group

There’s a Biml user group in Amsterdam and Koos van Strien took notes:

  • Historically, Varigence has always given away lots of their work for free, and they’ll continue to do so. There are few (maybe no) companies giving this percentage of their work away for free, without having the barrier set at “if you want to start working really, you need our paid product”)

  • When features are introduced as free, they will stay free forever. Sometimes this means the introduction of features in the free product needs to be postponed to see the complete impact.

  • According to Scott, this is shown in the release of Biml Express: they could’ve easily dropped some features and move it into the paid versions of Biml, but they didn’t. Only added new features.

  • The “free while in beta” announcement on Biml Online is mainly a lawyer thing – you can expect BimlOnline to remain free too.

  • If a good SaaS-model is developed, the tools will all be free. But we’re not there yet…

This sounds like it was a pretty long discussion with Scott Currie and I’m insanely jealous that there’s a Biml user group but it’s nowhere near me…

Comments closed