Press "Enter" to skip to content

Author: Kevin Feasel

Disambiguating Azure SQL Database Classes

Arun Sirpal explains the different types of Azure SQL Database available to us:

I want to do a quick summary post of the many different types of Azure SQL Database available and I am not talking about elastic pools, VMs etc, more so the singleton type.

Azure SQL Database (I call normal mode) – A choice between the DTU model (Basic, Standard and Premium) and vCore (General Purpose and Business Critical). Within this space there are two different architecture types used by Microsoft under the covers.

As the product expands, we get more and more options, and Arun clarifies where each fits.

Comments closed

Alerting on Refresh Failure in Power BI

Matt Allington shows how you can set up an alert to contact you when a scheduled Power BI data refresh fails:

In this article I am going to share with you a concept to manage refresh failures in production PowerBI.com reports. In a perfect world, you should configure your queries so that they “prevent” possible refresh failure issues from occurring, but also to notify you when something goes wrong without the report refresh failing in the first place. There are many things that can go wrong with report refreshes and you probably can’t prevent all of them occurring. In the example I use in this article I will show you how to prevent a refresh failure caused by duplicates appearing in a lookup table after the report has been built, the model has been loaded to PowerBI.com and the scheduled refresh has been set up using a gateway. If during refresh a duplicate key occurs in any of the Lookup tables in the data model, the refresh fails, and the updated data does not go live.

Matt’s specific scenario is around duplicate data, but it can extend to other issues as well.

Comments closed

Troubleshooting Power BI Refresh Failures

Annie Xu gives us a few reasons why Power BI refreshes might fail:

License level: Power BI Premium license has different level and Power BI Premium capacity is shared within a Tenant and can be shared by multiple workspaces.The maximum number of models (datasets) that can be refreshed in parallel is based on the capacity node.  For example, P1 = 6 models, P2 = 12 models and P3 = 24 models.

Click through for the set of possibilities.

Comments closed

IIS Log Analysis in Power BI

Joy George Kunjikkur shows how you can build a Power BI dashboard to analyze IIS log files:

As developers, we all might have encountered situation of analyzing IIS web server logs. During the development time, the file is small and easy to analyze in Notepad or Excel. But when it grows to GBs in production servers we use other tools. One such popular tool to query IIS logs is LogParser. It is a free command-line tool from Microsoft. There are graphical applications around it to generate even charts. One such free tool is the Log Parser Studio. It is also from Microsoft.


Once we move up in career and had to deal with managers, product stakeholders or ever to CXO to show what the IIS logs say, we need more visuals or a dashboard reflecting the IIS logs. Though we can create visual using Log Parser Studio, it is tedious creating reports and charts one by one. 

Click through for a solution.

Comments closed

Upgrading Azure Kubernetes Service

Chris Taylor has a point updates to jump in Azure Kubernetes Service:

As it is late at night my brain wasn’t working as it should be but thought I’d put a quick blog out there to say that if you are on v1.11.5 and want to upgrade to >= v1.13.10 then you have to do this in a 2 stage process by upgrading to v1.12.8 first:

Fortunately, upgrading is pretty easy using the Azure command line or even the Azure portal.

Comments closed

WVPlots

Nina Zumel announces a new version of WVPlots on CRAN:

WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of this is that the older visualizations had our preferred color schemes hard-coded in. More recent additions to the package sometimes had palette or color controls, but not in a consistent way. Making color controls more consistent has been a “todo” for a while—one that I’d been putting off. A recent request from user Brice Richard (thanks Brice!) has pushed me to finally make the changes.

Click through to see what’s changed and for an example vignette.

Comments closed

RDDs, DataFrames, and Datasets in Spark

Brad Llewellyn walks us through the three key data structures in Apache Spark:

We see that creating an RDD can be done with one easy function.  In this snippet, sc represents the default SparkContext.  This is extremely important, but is better left for a later post.  SparkContext offers the .textFile() function which creates an RDD from a text file, parsing each line into it’s own element in the RDD.  These lines happen to represent CSV records.  However, there are many other common examples that use lines of free text.

It should also be noted that we can use the .collect() and .take() functions to view the contents of an RDD.  The difference between .collect() and .take() is that .take() allows us to specify the number of elements we want to retrieve, whereas .collect() returns the entire RDD.

My tendencies are probably skewed pretty heavily, but I live in DataFrames and almost never use raw RDDs anymore.

Comments closed

Computing Time to Payment on Invoices

Daniel Hutmacher has a painful but realistic problem to solve:

Here’s an example customer. You’ll notice right off the bat that we’re sending this customer an invoice every day on the 20th of the month. To add some complexity, the customer will arbitrarily pay parts of the invoiced amount over time, and to add insult to injury, the banking interface won’t tell us which invoice the customer is paying for, so we’ll just decide that each payment goes towards the oldest outstanding invoice.

Our task is to calculate how many days have elapsed, for each invoice, from invoice date to payment in full.

Daniel has an excellent solution to the problem, so check it out.

Comments closed

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift:

Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“.

When you build transformations that need to handle changing source schemas, your logic becomes tricky. In ADF, you can either build data flows that always look for patterns in the source and utilize generic transformation functions, or you can add a Derived Column that defines your flow’s canonical model.

Click through for the discussion and comparison. Schema drift has been the bane of Integration Services’s existence, so it’s good to see them tackling the idea in Azure Data Factory.

Comments closed