Press "Enter" to skip to content

Day: August 27, 2018

More On Radix Sorting In R

Inaki Ucar explains some of the nuance behind sorting in R:

The latest R tip in Win-Vector Blog encourages you to Use Radix Sort based on a simple benchmark showing a x35 speedup compared to the default method, but with no further explanation. In my opinion, though, the complete tip would be, instead, use radix sort… if you know what you are doing, because a quick benchmark shouldn’t spare you the effort of actually reading the docs. And here is a spoiler: you are already using it.

One may wonder why R’s default sorting algorithm is so bad, and why was even chosen. The thing is that there is a trick here, and to understand it, first we must understand the benchmark’s data and then read the docs.

Read the whole thing.

Comments closed

Your R Code Should Be In Source Control Too

Lindsay Carr explains the importance of storing your R code in source control:

But wait, I would need to learn an additional tool?

Yes, but don’t panic! Git is a tool with various commands that you can use to help track your changes. Luckily, you don’t need to know too many commands in Git to use the basic functionality. As an added bonus, using Git with RStudio takes away some of the burden of knowing Git commands by including buttons for common actions.

As with any tool that you pick up to help your scientific workflows, there is some upfront work before you can start seeing the benefits. Don’t let that deter you. Git can be very easy once you get the gist. Think about the benefits of being able to track changes: you can make some changes, have a record of that change and who made it, and you can tie that change to a specific problem that was reported or feature request that was noted.

It’s still code, and you gain a lot by keeping code in source control.

Comments closed

The Power Of Resilient Distributed Datasets

Ramandeep Kaur explains just how powerful Resilient Distributed Datasets are:

A fault-tolerant collection of elements that can be operated on in parallel:  “Resilient Distributed Dataset” a.k.a. RDD

RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on different nodes of the cluster.

RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state.

Coarse-grained transformations are those that are applied over an entire dataset. On the other hand, a fine grained transaction is one applied on smaller set, may be a single row. But with fine grained transactions you have to save the updates which can be costlier but it is flexible than a coarse grained one.

Read on for more about the fundamental data structure in Spark.

Comments closed

Quick Tips For Working With Extended Events

Tibor Karaszi argues that it’s never too late to get into Extended Events:

I know, I know. New habits are hard to learn. Many of us have been using SQL trace and the Profiler GUI for a very long time. And we know that we are supposed to move over to Extended Events (XE), but we postpone it for some later time. And then we give XE a try, and some thing doesn’t work as we want. So we go back to more familiar territories.

But XE has really grown on me over the last few years. I like to share the things that I initially didn’t like with XE, and how I overcame them. And also some other of my tips to make it easier to be productive with XE. I will deliberately minimize showing T-SQL and queries against the XE dynamic management views here. As you use XE more and more, you will probably use T-SQL to a higher degree. But this blog post is for those of you who want to “get into” XE and I find using a GUI is great as a starting point. Assuming the GUI is any good, that is. And I think the SSMS GUI is, for most parts.

There are a lot of tips here, so check out Tibor’s advice.

Comments closed

Reading Power BI Log Files

Kellyn Pot’vin-Gorman shows us where we can find Power BI logs and what they look like when we load them into Power BI:

Second one is  inspecting the Reporting Server Portal log, (RSPortal**.log) that resides in C:\Program Files\Microsoft Power BI Report Server\PBIRS\LogFiles

We again load this log file via Get Data –> Text/CSV and then choose to view all files, as it won’t see the .log extension otherwise.  Choose the file and click on Edit.

The M query displays the changes I performed to format the data into something that can easily be worked with.  Because of the stagnated output of the data lines, this will format the error and warning messages, with the rest of the rows only having the Information Message fulfilled, the rest of the columns will be null:

Read the whole thing.

Comments closed

Changing Connection Strings In VertiPaq Analyzer

Shabnam Watson shows us how to change the connection string in VertiPaq Analyzer, a plugin for Excel:

While trying to set up VertiPaq Analyzer on a new computer, I ran into a problem where Excel was not letting me change the SSAS connection that was built in the workbook. It turns out I had missed one of steps in the instructions in the workbook. As a result, when I got to Connection Properties, everything was grayed out and this message was at the bottom:

Some properties cannot be changed because this connection was modified using PowerPivot Add-in.

Read on to see how to fix this.  And check out VertiPaq Analyzer if you’re working heavily with Analysis Services Tabular or Power BI.

Comments closed

Power Platform Licensing And Pricing

Wolfgang Strasser explains how you can get started with the Microsoft Power Platform:

This blog post is part of my Power Platform blog series.

Maybe you’ve already heard about the Microsoft Power Platform (which consists three tools Power BI, PowerApps and Microsoft Flow) and now is the time to start testing it?

The first questions that arise are: What do I need? Do I need to pay if I only want to try it out?

Licensing can get tricky, so it’s good to get a clear explanation of pricing and what you can do with the products.

Comments closed

Reserved Capacity With Azure SQL Database

Chris Seferlis explains the concept of Azure SQL Database Reserved Capacity:

Last week I posted about the Azure Reserve VM Instance where you could save some money in Azure. Another similar way to save is with Azure SQL Database Reserved Capacity. With this you can save 33% compared to license included pricing by pre-buying SQL Database pre-cores for a 1- or 3-year term.

This can be applied to a single subscription or shared across your enrollments, so you can control how many subscriptions can use the benefit, as well as how the reservation is applied to the specific subscriptions you choose.

The reservation scope to a single subscription allows you to apply it to that SQL Database resource(s) within the selected subscription. A reservation with a shared scope can be shared across subscriptions in the enrollment and there’s some flexibility involved like Managed Instances where you can scale up/down.

Read on for more.  AWS had been offering discounts for reserved capacity for a while, but now we’re seeing Microsoft play the game too.

Comments closed