Curated SQL – Page 1101 – A Fine Slice Of SQL Server

Azure Data Factory Switch Activity

Published 2019-10-22 by Kevin Feasel

Rayis Imayev explains what the Switch activity does in Azure Data Factory:

Developing conditional logic of your Azure Data Factory control flow has been simplified with introducing of the Switch activity – https://docs.microsoft.com/en-us/azure/data-factory/control-flow-switch-activity. Official documentation resource states, this new data factory activity “provides the same functionality that a switch statement provides in programming languages“. I would also add a more simplified definition of the Switch activity in Azure Data Factory: it is a container (or wrapper) for multiple IF conditions.

Click through for an example.

Comments closed

Accelerated Database Recovery

Published 2019-10-22 by Kevin Feasel

Andy Mallon explains the concept of Accelerated Database Recovery:

Accelerated Database Recovery(ADR) is a new feature intended to speed up the recovery process, which could be very slow, particularly when there are long-running, large transactions. ADR is not just for recovery after a crash, but also helps in other scenarios where the transaction log needs to be recovered–including Availability Group secondary redo and Failover Cluster Instance failovers.

This is one of the most interesting new features in SQL Server 2019.

Comments closed

Using Power BI Cards to Display Notes

Published 2019-10-22 by Kevin Feasel

Prathy Kamasani takes us through one use of the card visual in Power BI:

This is a long-overdue blog post. A couple of months ago, I worked with a client in Amsterdam; one of the use cases was to show key metrics, flags that need attention. The user also wanted to click on warning symbol to confirm, where the issues were, however, the user didn’t want a drill through, it has to be a left-click.
As of yet, except for button/action we can not do left clicks in Power BI. As the user didn’t want the report to open in another browser tab etc., so was thinking about other options and at the end decided to go for tooltips and symbols to show flags like below:

Click through for an example and an explanation of how it works.

Comments closed

Building Custom R Packages

Published 2019-10-21 by Kevin Feasel

Brad Lindblad takes us through building a custom package in R:

Don’t repeat yourself (DRY) is a well-known maxim in software development, and most R programmers follow this rule and build functions to avoid duplicating code. But how often do you:
– Reference the same dataset in different analyses
– Create the same ODBC connection to a database
– Tinker with the same colors and themes in ggplot
– Produce markdown docs from the same template
and so on? Notice a pattern? The word “same” is sprinkled in each bullet point. I smell an opportunity to apply DRY!

This is a good point: packages don’t have to go out to the broader world. They’re useful even if they just help you (or your team) out. H/T R-bloggers

Comments closed

Evaluating a Classification Model with a Spam Filter

Published 2019-10-21 by Kevin Feasel

John Mount shares an extract from Mount and Nina Zumel’s Practical Data Science with R, 2nd Edition:

This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction.
It is funny, but it takes some effort to teach in this way. New data scientists want to dive into the details of model construction first, and statisticians are used to getting model diagnostics as a side-effect of model fitting. However, to compare different modeling approaches one really needs good model evaluation that is independent of the model construction techniques.

Click through for that extract. I liked the first edition of the book, so I’m looking forward to the 2nd.

Comments closed

Top 5 and All Others in Power BI

Published 2019-10-21 by Kevin Feasel

Marco Russo and Alberto Ferrari show how you can include the top N rows and include an “Others” aggregate at the end:

Power BI offers the ability to apply a Top N constraint in a visual level filter, so that only a certain number of items are visible based on the evaluation of a measure. A common requirement is to show an additional row that accumulates the “other” items, which are those that are not visible in the report like in the following figure.
In order to solve this scenario you cannot use the Top N filter of Power BI. Instead, you apply the filter in a special measure (TopN Sales) and you use a calculated table to accommodate for the additional row named Others. Moreover, you need an additional column to let the Others row appear at the bottom of the table.

Read on to see how you can solve the problem.

Comments closed

Understanding PERCENTILE_CONT

Published 2019-10-21 by Kevin Feasel

Kathi Kellenberger takes us through the PERCENTILE_CONT window function:

I was recently playing with the analytical group of windowing functions, and I wanted to understand how they worked “under the covers.” I ran into a little logic puzzle with PERCENTILE_CONT by trying to write a query that returned the same results using pre-2012 functionality.
Given a list of ranked values, you can use the PERCENTILE_CONT function to find the value at a specific percentile. For example, if you have the grades of 100 students, you can use PERCENTILE_CONT to locate the score in the middle of the list, the median, or at some other percent such as the grade at 90%. This doesn’t mean that the score was 90%; it means that the position of the score was at the 90^th percentile. If there is not a value at the exact location, PERCENTILE_CONT interpolates the answer.

I’m a bit disappointed with how poorly PERCENTILE_CONT performs against large data sets, especially if you need multiple percentiles. It’s bad enough that going into ML Services and getting percentiles with R is usually faster for me. But for datasets of less than 100K or so rows, it’s the easiest non-CLR method to get the median (with the easiest CLR method being SQL#).

Comments closed

Running Big Data Clusters on VS Subscriptions

Published 2019-10-21 by Kevin Feasel

Kevin Chant has a few tips for people wanting to try out Big Data Clusters with their Visual Studio subscriptions to Azure:

In order to present the right results for various outcomes I attempted to deploy Big Data Clusters multiple times.
When I say multiple times, I mean the number of deployments easily went into double figures. Because I was testing deploying various virtual machine sizes in multiple regions.
Hence, I spent many hours testing and verifying the results in order to present them properly.

Read on to see Kevin’s notes and recommendations.

Comments closed

Refreshing Power BI Dataflows with Powershell

Published 2019-10-21 by Kevin Feasel

Craig Porteous shows how to use the Power BI Dataflows REST API with Powershell:

I like to use my favourite scripting language to do this – PowerShell. Although we have the Power BI Management PowerShell module (MicrosoftPowerBIMgmt) to interact with Power BI, the cmdlets aren’t yet there to refresh or retrieve the history of a dataflow (or even a dataset) but the module can still help us get what we need without jumping through too many hoops (and as long as we aren’t automating the authentication, that’s another post.).

Click through to see how it’s done.

Comments closed

When Power Query Hits Data Sources Repeatedly

Published 2019-10-21 by Kevin Feasel

Chris Webb answers an age-old question:

If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts