Kevin Feasel – Page 1006

Distributed Transactions on Linux

Published 2019-08-05 by Kevin Feasel

Tejas Shah and crew announce distributed transactions with SQL Server on Linux:

With SQL Server 2017, a new era was heralded with SQL server being available to deploy on Linux (and Linux based container) systems. While all functionality of the SQL Server engine were brought over as is to SQL Server on Linux, some of the functionality which depended on Windows system processes such as distributed transactions (which relies on MSDTC service) were not brought over immediately.

Well, now your wait is over.

Comments closed

Hearing Differences in Power BI

Published 2019-08-05 by Kevin Feasel

David Eldersveld shows us one technique for adding a sound component to your Power BI dashboard:

Data sonification uses variations in audio to hear differences in data values. From an accessibility standpoint, data sonification offers another potential avenue to enrich your reports beyond methods pertaining to data visualization.
There are a few pieces that need to be assembled to enable data sonification that works in both Power BI Desktop and Service. While, audio has been used in the Power BI Service for awhile, Power BI Desktop has been silent until now. This post attempts to show how to produce audio tones in Power BI for greater accessibility. It also demonstrates how to blend data with a standard range of audio pitches.

It’s an interesting idea.

Comments closed

Finding Memory-Rich Queries

Published 2019-08-05 by Kevin Feasel

Matthew McGiffen wants to find the queries which demand the largest memory grants:

I had a server that looked like it had been suffering from memory contention. I wanted to see what queries were being run that had high memory requirements. The problem was that it wasn’t happening right now – I needed to be able to see what had happened over the last 24 hours.
Enter Query Store. In the run-time stats captured by Query Store are included details relating to memory.

Click through for a script which retrieves this data over a time frame.

Comments closed

ClassNotFoundException and .NET Spark

Published 2019-08-05 by Kevin Feasel

Ed Elliott takes us through two causes for a ClassNotFoundException when running a Spark job with .NET Spark:

There was a breaking change with version 0.4.0 that changed the name of the class that is used to load the dotnet driver in Apache Spark.
To fix the issue you need to use the new package name which adds an extra dotnet near the end, change:
spark-submit --class org.apache.spark.deploy.DotnetRunner

Click through to see what you should change this line of code to read. If that change doesn’t fix your problem, Ed has a broader solution.

Comments closed

Adding Aggregates to Table.Profile

Published 2019-08-05 by Kevin Feasel

Chris Webb shows us how to add additional aggregates to Table.Profile in M:

A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:
https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/
Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.

Click through for that follow-up.

Comments closed

Extended Filtering in DAX

Published 2019-08-05 by Kevin Feasel

Matt Allington continues a discussion on the FILTER() function in DAX:

The new formula follows the rule “don’t filter a table if you can filter a column”. But in this case the column and the table have the same cardinality, so there is little benefit there. Also, the new formula requires a second CALCULATE() and SUM() inside the FILTER() function. This is required because the column Customers[YearlyIncome] is no longer in the same table that FILTER() is iterating. The FILTER() function is iterating a virtual, single column table that contains all customer keys in the customer table. The column Customers[YearlyIncome] doesn’t exist in this virtual table, it exists in the Customers table, so you must wrap the column in an aggregation function, SUM() in this case. Further, as the FILTER() function iterates in a row context through the virtual table, the virtual relationship does not filter the connected tables UNLESS you specifically tell the formula to do so. Technically, to make the filter propagate from the new virtual table created by ALL(Customers[CustomerKey]), we need to convert the row context into an equivalent filter context via context transition. Context transition is triggered by the inner CALCULATE() inside the FILTER() function in this case.

Read on for several tips for efficient filtering.

Comments closed

PolyBase and External Column Names

Published 2019-08-02 by Kevin Feasel

I have another post looking at external columns on PolyBase V2 data sources:

I’m going to use external two tables in this experiment. In the left corner, we have some ORC files stored in Azure Blob Storage which we’ll represent as FireIncidents2017. In the right corner, we have data stored in a remote SQL Server instance which we’ll call LineItem. The data doesn’t really matter that much, but to give you an idea of where we’re going, I’ll show each table.

There’s quite a bit you can do here.

Comments closed

Validating Errors in A/B Testing

Published 2019-08-02 by Kevin Feasel

Roland Stevenson shows us how to validate Type I and Type II errors when performing A/B tests in R:

In this post, we seek to develop an intuitive sense of what type I (false-positive) and type II (false-negative) errors represent when comparing metrics in A/B tests, in order to gain an appreciation for “peeking”, one of the major problems plaguing the analysis of A/B test today.
To better understand what “peeking” is, it helps to first understand how to properly run a test. We will focus on the case of testing whether there is a difference between the conversion rates cr_a and cr_b for groups A and B. We define conversion rate as the total number of conversions in a group divided by the total number of subjects. The basic idea is that we create two experiences, A and B, and give half of the randomly-selected subjects experience A and half B. Then, after some number of users have gone through our test, we measure how many conversions happened in each group. The important question is: how many users do we need to have in groups A and B in order to measure a difference in conversion rates of a particular size?

Read the whole thing. H/T R-Bloggers

Comments closed

Biml Support in Visual Studio Code

Published 2019-08-02 by Kevin Feasel

Cathrine Wilhelmsen takes us through Biml support in Visual Studio Code:

Please note that you only get syntax highlighting with this extension. You do not get the full Biml or .NET intellisense, the BimlScript preview pane, or the ability to generate SSIS packages from Biml. For those things, you will still need BimlExpress for Visual Studio.
However! If you simply want to view your Biml files in a lightweight editor, the Biml Support extension works beautifully

It’s not full support, but it’s something.

Comments closed

Database Page Allocations Function

Published 2019-08-02 by Kevin Feasel

Max Vernon takes us through the sys.dm_db_database_page_allocations Dynamic Management Function:

sys.dm_db_database_page_allocations is an undocumented SQL Server T-SQL Dynamic Management Function. This DMF provides details about allocated pages, allocation units, and allocation extents.

Read on for additional details. This is an undocumented function, so it might change between versions but it will give you an idea of how it works under the covers.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel