Press "Enter" to skip to content

Month: January 2020

Explaining Black Box Models with LIME

Holger von Jouanne-Diedrich takes us through the intuition of LIME:

There is a new hot area of research to make black-box models interpretable, called Explainable Artificial Intelligence (XAI), if you want to gain some intuition on one such approach (called LIME), read on!

Before we dive right into it it is important to point out when and why you would need interpretability of an AI. While it might be a desirable goal in itself it is not necessary in many fields, at least not for users of an AI, e.g. with text translation, character and speech recognition it is not that important why they do what they do but simply that they work.

In other areas, like medical applications (determining whether tissue is malignant), financial applications (granting a loan to a customer) or applications in the criminal-justice system (gauging the risk of recidivism) it is of the utmost importance (and sometimes even required by law) to know why the machine arrived at its conclusions.

One approach to make AI models explainable is called LIME for Local Interpretable Model-Agnostic Explanations. There is already a lot in this name!

LIME is not trivial to use and it can be very slow, but it is a great way to visualize models.

Comments closed

Improving Join Performance on ADF Data Flows

Mark Kromer has a few tips on improving ADF data flow join performance:

When you include literal values in your join conditions, Spark may see that as a requirement to perform a full cartesian product first, then filter out the joined values. But if you ensure that you (1) have column values from both sides of your join condition, you can avoid this Spark-induced cartesian product and improve the performance of your joins and data flows. (2) Avoid use of literal conditions to represent the results of one side of your join condition.

In other words, avoid this for your join condition:source1@movieId == '1'Instead, implement that with a dummy derived column. 

There are several good tips in this post.

Comments closed

2 New Undocumented Items in SQL Server 2019

Solomon Rutzky takes us through a couple of undocumented additions to SQL Server 2019:

SQL Server 2019 introduced, among other things, two new filesystem-related items:

1. a system stored procedure, sys.xp_delete_files(), and
2. an instance-level configuration option, 'allow filesystem enumeration'

Both are undocumented, so let’s see if we can figure out what they do.

Click through to see how both work.

Comments closed

Auditing Login Events Using Service Broker

Max Vernon takes us through using Service Broker to audit login events:

Logging to the SQL Server Error Log or the Windows Security Event Log means you’ll need some kind of tool to slice-and-dice the data, postmortem. It’s difficult to respond to events as they happen with this kind of auditing, and hard to create simple T-SQL queries to inspect the data. You could create a login trigger at the server level, but that will only allow you to capture successful logins. Coding the trigger incorrectly can result in everyone being locked out of the server. You’ll need to use the Dedicated Administrator Connection, otherwise known as the DAC, to login to the server and disable the errant trigger. Not fun.

Luckily, there is a much better option; using SQL Server’s built-in Event Notification service to receive login events through Service Broker. This event stream is asynchronous to the login process, meaning it won’t interrupt or slow down the login process, and it allows you to capture both successful and failed logins to a table, either locally or remotely. For larger SQL Server infrastructures, it’s not uncommon to setup a single SQL Server instance to gather this information for central analysis.

This blog post shows how to setup a database locally for auditing login events via SQL Server Event Notifications and Service Broker.

Click through for a script-heavy post which helps you all the way through the process.

Comments closed

Add-ClusterNode Error: Keyset Does Not Exist

Jonathan Kehayias troubleshoots a Windows Server clustering problem:

While working on a video recording for Paul this week I ran into an interesting problem with one of my Windows Server 2016 clusters. While attempting to add a new node to the cluster I ran into an exception calling Add-ClusterNode:

The server ‘SQL2K16-AG03.SQLskillsDemos.com’ could not be added to the cluster.
An error occurred while adding node ‘SQL2K16-AG03.SQLskillsDemos.com’ to cluster ‘SQL2K16-WSFC’.

Keyset does not exist

The windows account I was using was the domain administrator account and I had just recently made modifications that involved the certificate store on this specific VM, so I decided to take a backup of the VMDK and then revert to a snapshot to try again, and this time it worked.  So needless to say I was intrigued as to what I could have done that would be causing this error to happen.

Read on to see what the root cause was and how you can fix it.

Comments closed

Viewing Power BI Audit and Activity Logs

Jeff Pries gives us the rundown on auditing in Power BI:

When using the cloud-based Power BI Service, powerbi.com, every action that is taken while logged into the portal — whether it is viewing or publishing a report, creating a new workspace, or even signing up for a pro trial license, that activity is logged within the Microsoft servers as part of the Office 365 audit logs.

Accessing these logs can be accomplished via a couple of different methods (either through the Office 365 Audit Log functionality using the Office 365 Admin Center or PowerShell cmdlets; or through the new Power BI Activity Log (Power BI Get Activity Events) functionality accessible via a PowerShell cmdlet (Get-PowerBIActivityEvent) and an API). There are a few examples out there already on how to use these commands to access the data (and I have a post on accessing the data using the Power BI API and C# coming out in a few week), but there doesn’t seem to be a lot out there about the data itself, which is what I plan to focus on here.

Read on for more details as well as the structure around a forthcoming application to parse these logs and store them locally in SQL Server.

Comments closed

Dealing with Big Ranges in a Graph

Alex Velez shows how we can work with a particular case of problem:

Today’s post is about a common challenge: when one data series is so large relative to the others that a single scale makes it nearly impossible to see any details. Consider the following line graph. It displays state and local revenue by transportation mode, which I created using data from the Bureau of Transportation Statistics 2018 Report.

Alex has one solution. Another idea could be to change the Y axis to log scale, especially because you’re dealing with money. That would tighten up the series and allow for more information to be displayed on the single graph.

Comments closed

Generating Synthetic Data with R

Sidharth Macherla uses the conjurer package in R to generate synthetic data:

If you are building data science applications and need some data to demonstrate the prototype to a potential client, you will most likely need synthetic data. In this article, we discuss the steps to generating synthetic data using the R package ‘conjurer’. 

One of the toughest problems of generating data is making it look realistic enough. It’s one level of difficulty to build “steady-state” data, but if you want data to follow a combination of trend and random walk…that’s when things get dicey. H/T R-Bloggers

Comments closed

Find Installed ODBC Drivers with Powershell

Jack Vamvas answers a question:

Question: I had a question from a developer who was troubleshooting an application to SQL Server connection via a DSN using ODBC. They wanted the SQL Server ODBC Driver names and platform (32 bit|64 bit) used to connect . They will use this information to check application compatibility.

Although i can just RDP onto the server and grab the information through the ODBC gui – how can I use Powershell to get these ODBC details?

Click through for the answer.

Comments closed

Building a Dual-Axis Line Chart in Power BI

Matt Allington shows how you can build a dual-axis line chart in Power BI:

Unfortunately, Power BI does not support a dual axis line chart as a standard visual at this time. The good news however is there is a custom visual called “Multiple Axes chart by xViz” that can do this in Power BI.  This visual has been around for a while, but there have been some formatting issues (in my view) that prevented it being a solution to this problem – that is now fixed).  I will demonstrate how to set up a dual axis charge using the Adventure Works database and this visual.

Honestly, I’m pretty happy that Power BI does not support a dual-axis line chart. It is the cause of so many instances of spurious correlation that I’d err on the side of not including multiple axes.

Comments closed