Month: August 2018

Here I am sharing the slides for a webinar I gave for SAP about Explaining Keras Image Classification Models with LIME.

Slides can be found here: https://www.slideshare.net/ShirinGlander/sap-webinar-explaining-keras-image-classification-models-with-lime

Read on for links to additional resources as well.

Comments closed

Factors In R

Published 2018-08-31 by Kevin Feasel

Dave Mason continues his look at R, this time covering the concept of factors:

Factor data can be nominal or ordinal. In our examples so far, it is nominal. “C”, “G”, and “F” (and “Center”, “Guard”, and “Forward” for that matter) are names that have no comparative order to each other. It’s not meaningful to say a Center is greater than a Forward or a Forward is less than a Guard (keep in mind these are position names–don’t let height cloud your thinking). If we try making a comparison, we get a warning message:
> position_factor[1] > position_factor[2]
[1] NA
Warning message:
In Ops.factor(position_factor[1], position_factor[2]) :
  ‘>’ not meaningful for factors
Ordinal data, on the other hand, can be compared to each other in some ranked fashion–it has order. Take bed sizes, for instance. A “Twin” bed is smaller than a “Full”, which is smaller than a “Queen”, which is smaller than a “King”. To create a factor with ordered (ranked) levels, use the ordered parameter, which is a logical flag to indicate if the levels should be regarded as ordered (in the order given).

Check it out.

Comments closed

Posting Power BI Data Alerts To Slack

Published 2018-08-31 by Kevin Feasel

Esat Erkec shows how to post a Power BI data alert into a Slack channel with Microsoft Flow:

Demonstration

In this demonstration, we will complete the following steps.

Create AdventureworksLT sample database in Azure SQL (Platform as a Service)
Create a simple report with Power BI and publish this report to Power BI Portal
Create Power BI data alert
Integrate Power BI data alert notification and Slack with Microsoft Flow

It’s surprisingly easy—most of the article is just creating the Power BI dashboard.

Comments closed

What To Do After Installing SQL Server On Linux

Published 2018-08-31 by Kevin Feasel

Manoj Pandey has a few tips for what to do after installing SQL Server on Linux:

Here are some of the best practices post installing SQL Server on Linux that can help you maximize database performance:

1. To maintain efficient Linux and SQL Scheduling behavior, it’s recommended to use the ALTER SERVER CONFIGURATION command to set PROCESS AFFINITY for all the NUMANODEs and/or CPUs. [Setting Process Affinity]

2. To reduce the risk of tempdb concurrency slowdowns in high performance environments, configure multiple tempdb files by adding additional tempdb files by using the ADD FILE command. [tempdb Contention]

3. Use mssql-conf to configure the memory limit and ensure there’s enough free physical memory for the Linux operating system.

Some of these are common for Windows and Linux (like multiple tempdb files) but there are several Linux-specific items here.

Comments closed

What Prevents Plan Reuse?

Published 2018-08-31 by Kevin Feasel

Eric Blinn walks us through what might cause a query plan not to be used:

There are several reasons that a query plan would need to be compiled again, but they can be boiled down to a few popular reasons.

The first one is simple. The plan cache is stored exclusively in memory. If there is memory pressure on the instance SQL Server will eject plans from cache that aren’t being used to make room for newer, more popular plans or even to expand the buffer pool. If a command associated to a plan that has been ejected from the plan cache is issued, it will need to be compiled again before it can execute.

Since SQL Server 2008 a system stored procedure, sp_recompile, has been available to clear a single stored procedure plan from the cache. When executed with a valid stored procedure name as the only parameter any plans for that procedure will be marked for recompilation so that a future execution of that procedure will need to be compiled. Running sp_recompile does not actually compile the procedure. It simply invalidates any existing plans so that some future execution, which in theory may never come, will need to compile before executing.

Read on for additional causes.

Comments closed

When Query Store Alterations Are Blocked

Published 2018-08-31 by Kevin Feasel

Erin Stellato gives us some helpful tips on Query Store:

If you are trying to execute an ALTER DATABASE command to change a Query Store option (e.g. turn it off, change a setting) and it is blocked, take note of the blocking session_id and what that session_id is executing. If you are trying to execute this ALTER command right after a failover or restart, you are probably blocked by the Query Store data loading.

As a reminder, when a database with Query Store enabled starts up, it loads data from the Query Store internal tables into memory (this is an optimization to make specific capabilities of Query Store complete quickly). In some cases this is a small amount of data, in other cases, it’s larger (potentially a few GB), and as such, it can take seconds or minutes to load. I have seen this take over 30 minutes to load for a very large Query Store (over 50GB in size).

Erin has a story which ties this together, so check that out.

Comments closed

Unraveling Rolling Totals With Power Query

Published 2018-08-31 by Kevin Feasel

Imke Feldmann shows us how to get from rolling totals back to the original values using Power Query:

To retrieve this value, one would have to start with the first value in the year. This is also the value of the first quarter, but for the 2nd quarter, one would have to deduct the value of the first quarter from the cumulative value of the 2nd quarter. So basically retrieving the previous cumulative row and deduct it from the current cumulative row. Do this for every row, unless it’s the start of the year or belongs to a different account code in this example:

(Although for the data given in the sample, it would be sufficient to just take the year as a discriminator, but to be on the save side, I would suggest to include the different accounts as well)

That’s a pretty interesting approach.

Comments closed

Explaining Data Flows (And Dataflows)

Published 2018-08-31 by Kevin Feasel

Melissa Coates disambiguates “data flows” from “dataflows” because those are two totally different things:

It’s another terminology post! Earlier this week I was having a delightful lunch with Angela Henry, Kevin Feasel, Javier Guillen, and Jason Thomas. We were chatting about various new things. Partway thru our conversation Jason stops me because he thought I was talking about Power BI Dataflows when I was really talking about Azure Data Factory Data Flows. It was kind of a funny moment actually but it did illustrate that we have some overlapping terminology coming into our world.

So, with that inspiration, let’s have a chat about some of the new data flow capabilities in the Microsoft world, shall we?

Melissa clarifies the term “data flow” (or “dataflow” as the case may be) across several products in Microsoft’s BI stack. Worth the read.

Comments closed

Time-Series Analysis With Box-Jenkins

Published 2018-08-30 by Kevin Feasel

The folks at Knoyd walk us through time series analysis using the Box-Jenkins method:

However, this approach is not generally recommended so we have to find something more appropriate. One option could be forecasting with the Box-Jenkins methodology. In this case, we will use the SARIMA (Seasonal Auto Regressive Integrated Moving Average) model. In this model, we have to find optimal values for seven parameters:

Auto Regressive Component (p)

Integration Component (d)

Moving Average Component (q)

Seasonal Auto Regressive Component (P)

Seasonal Integration Component (D)

Seasonal Moving Average Component (Q)

Length of Season (s)

To set these parameters properly you need to have knowledge of auto-correlation functions and partial auto-correlation functions.

Read on for a nice overview of this method, as well as the importance of making sure your time series data set is stationary.

Comments closed

Databricks Runtime 4.3 Released

Published 2018-08-30 by Kevin Feasel

Todd Greenstein announces Databricks Runtime 4.3:

In addition to the performance improvements, we’ve also added new functionality to Databricks Delta:

Truncate Table: with Delta you can delete all rows in a table using truncate. It’s important to note we do not support deleting specific partitions. Refer to the documentation for more information: Truncate Table
Alter Table Replace columns: Replace columns in a Databricks Delta table, including changing the comment of a column, and we support reordering of multiple columns. Refer to the documentation for more information: Alter Table
FSCK Repair Table: This command allows you to Remove the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system. This can happen when these files have been manually deleted. Refer to the documentation for more information: Repair Table
Scaling “Merge” Operations: This release comes with experimental support for larger source tables with “Merge” operations. Please contact support if you would like to try out this feature.

Looks like a nice set of reasons to upgrade.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31