Graphing With Microsoft R Open

David Smith points out a free e-book on creating effective graphs with Microsoft R Open:

The examples were done using Microsoft R Open, but since it’s 100% compatible with R the code works with any relatively recent R version.

Naomi and Joyce presented several examples from their e-book in a recent webinar (presented by Microsoft), and fielded lots of interesting questions from the audience. If you’d like to see the recorded webinar and also receive a copy of the slides and the e-book, follow the link below to register to receive the materials via email.

The book is free, the code is available on GitHub.  What more could you ask for?

Power BI Desktop Or Power Pivot

Bill Anton discusses when to use Power BI Desktop and when to use Power Pivot:

In the whitepaper, Strategic Prototyping is defined as the process of leveraging Power BI to explicitly seek out feedback from users during a requirements discovery session. The general idea is to use a prototyping tool to quickly slap together a model and mock up some reports while working closely with 1 or more business users. This helps ensure all reporting capabilities are flushed out and accounted for. After several iterations, the Power BI model becomes the blueprint upon which an enterprise solution can be based.

Prior to the emergence of Power BI, the tool of choice for strategic prototyping (at least in Microsoft shops) was Power Pivot. And even though the reporting side of Power Pivot is nowhere near as sexy as Power BI, there is one really awesome feature that does not (yet?) exist with Power BI… and that’s the “Import from PowerPivot” option in visual studio…

Bill does a good job of explaining the alternatives and, importantly, explaining that whichever you pick, there will be follow-up work.

Installing Apache Falcon

Awanish at Edureka shows how to install Apache Falcon on your Hadoop cluster:

Apache Falcon is a framework for managing data life cycle in Hadoop clusters. It establishes relationship between various data and processing elements on a Hadoop environment, and also provides feed management services such as feed retention, replications across clusters, archival etc.

Let us first discuss how to setup Apache Falcon. Run the below given command to download git repository of Falcon:

Command: git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon

Falcon comes as part of the Hortonworks Data Platform; Cloudera has its own alternative.

Restoring Azure SQL Databases With Powershell

Mike Fal shows us how to restore an Azure SQL Database database using Powershell:

The most fundamental form of disaster recovery is database backups and restores. Typically setting up backups is a lot of work. DBAs need to make sure there’s enough storage available for backups, create schedules that accommodate business operations and support RTOs and RPOs, and implement jobs that execute backups according to those schedules. On top of that, there is all the work that has to be done when backups fail and making sure disk capacity is always large enough. There is a huge investment that must be made, but it is a necessary one, as losing a database can spell death for a company.

This is one of the HUGE strengths of Azure SQL Database. Since it a service offering, Microsoft has already built out the backup infrastructure for you. All that stuff we talked about in the previous paragraph? If you use Azure SQL Database, you do not have to do any of it. At all.

What DBAs still need to manage is being able to restore databases if something happens. This is where Powershell comes into play. While we can definitely perform these actions using the portal, it involves a lot of clicking and navigation. I would much rather run a single command to run my restore.

The Powershell cmdlets are easy to use, so spin up an instance and give it a try.

Power BI Takeover

Devin Knight has a Q&A on Power BI:

Q: what is the difference between the Query editor and Data Modeler? What can and can’t do in each case ?

To summarize the Query Editor is mainly for Data Extraction actions.  So providing source information, applying rules to the incoming data, etc… The Data Modeling areas are focused on creating relationships between tables you’ve important and creating calculations you might need in your report.  This of this as the last step to prepare you data for reports.

Check out Devin’s webinar as well.  It’s a lot longer than a coffee break, but worth your time.

RID Lookup Or Key Lookup?

Aaron Bertrand asks which is faster, RID lookups or key lookups?

I’ve seen multiple people state that a heap can be better than a clustered index for certain scenarios. I cannot disagree with that. One of the interesting reasons I’ve seen stated, though, is that a RID Lookup is faster than a Key Lookup. I’m a big fan of clustered indexes and not a huge fan of heaps, so I felt this needed some testing.

So, let’s test it!

I thought it would be good to create a database with two tables, identical except that one had a clustered primary key, and the other had a non-clustered primary key. I would time loading some rows into the table, updating a bunch of rows in a loop, and selecting from an index (forcing either a Key or RID Lookup).

It looks like RID lookups are slightly faster than key lookups.  But check out the comments:  this is a best-case scenario.

Connecting To SQL Data Warehouse

Robert Sheldon looks at ways to connect to Azure SQL Data Warehouse:

Unlike SSMS, Microsoft does support connecting to SQL Data Warehouse from Visual Studio, via the database engine features in SSDT. When you get into the Visual Studio/SSDT environment, open SQL Server Object Explorer, which is similar to Object Explorer in SSMS. From there, click the Add SQL Server button.

When the Connect dialog box appears, provide the server name, select SQL Server Authentication, and then specify the login name and password, as shown in the following figure.

It is a bit surprising that you can’t easily connect via SSMS 2014.  Maybe that’s changed with SSMS 2016?

Custom Power BI Visuals

Rob Farley looks into custom Power BI visuals:

I hadn’t explored much in the way of custom visuals in Power BI until a while back, even though I was very much aware of the competition that was held in September. It had been on my list to explore some of what was possible. And this month, the T-SQL Tuesday topic (hosted by Wendy Pastrick@wendy_dance) was to learn something new and to blog about it. So it seemed a good idea to learn how to make my own custom visualisation!

Now, creativity isn’t exactly my thing. I find it really hard to write songs, for example. I know how to do it – but I quickly become self-critical and get stuck. Writing is easier, because it feels less ‘creative’, and appeals more to the teacher / preacher in me (and I know that takes creativity, especially if you’ve ever seen me present, but it’s different). So sitting down and coming up with a new way of visualising data wasn’t something I was going to do.

For more info, check out more info on creating visuals and using R to visualize data in Power BI.

DB File Space Usage

Andy Mallon looks into sys.dm_db_file_space_usage:

SQL Server 2016 is coming out next month, and it’s chock full of new features that I can learn. Row-level security is interesting. Dynamic data masking sounds fun.

If you’re a regular reader of my blog, you probably know I try to approach questions from a unique angle. Instead of blogging about something cutting edge or sexy, I decided to scroll through the list of system views until I found one I didn’t recognize.

sys.dm_db_file_space_usage

The name is pretty self-explanatory, but I never noticed this existed until now. Seems like the type of DMV that I should have known about, but I didn’t. Quick look at BOL, and I got the verbose description from Microsoft:

Andy goes on to compare the outputs from this DMV to methods he’s historically used.

Trending And Smoothing

Matt Allington looks at trending and smoothing data in Power BI:

You can download the workbook here if you want to take a look, or simply look at the embedded version I have pinned at the bottom of the post.

Notice the spikes in sales in different months in the chart above?  These spikes are very common in sales data, and in my experience they can be even more prevalent in weekly sales data.  These spikes make it difficult to analyse trends in the data.  You could put a trend line into the chart (thanks to the April update do Power BI), but a standard linear trend line is too simplistic to really see what is happening in your data, particularly if there are seasonal changes.

One good way to look at the trends in your data is to add an Average Monthly Sales Rolling Quarter trend line to the chart.  You simply take the total sales of the last 3 months and then divide by 3.  If you were doing a weekly trend, take the last 13 weeks and divide by 13.  When you overlay this Avg Monthly Sales RQ line on the original chart, it looks like this.

This is a fairly advanced topic, but it’s also the kind of thing which separates good reporting from great reporting.

Categories

August 2017
MTWTFSS
« Jul  
 123456
78910111213
14151617181920
21222324252627
28293031