Press "Enter" to skip to content

Month: October 2020

Updating a Power BI Refresh Schedule

Martin Schoombee continues a series on automating Power BI deployments:

There’s a few things you need to pay close attention to when setting the refresh schedule via the API:

– Unless you’re setting the refresh schedule for a Premium workspace, you can only refresh a dataset up to 8 times a day. We’re only going to set it to update once a day here, but keep this in mind if you’re planning to adjust the API call to refresh multiple times a day.

– The name of the time zone you provide has to match exactly with the names (middle column) in this reference: Microsoft Time Zone Index

– The refresh time has to be in the format hh:mm, and similar to the options in the Power BI portal you can only refresh on the hour or half-hour.

Read on to see how it works and the API call to make.

Comments closed

Setting the Default Command Timeout with Microsoft.Data.SqlClient

Erik Ejlskov Jensen shows us a way to set a default command timeout in .NET’s Microsoft.Data.SqlClient:

With the latest 2.1.0 preview 2 release of the open source .NET client driver for Microsoft SQL Server and Azure SQL Database, Microsoft.Data.SqlClient, it is now possible to set the default command timeout via the connection string.

Now you can work around timeout issues simply by changing the connection string, where this previously required changes to code, and maybe changes to code you did not have the ability to change.

This is pretty nice, as my recollection was that you could set connection timeout via connection string, but not command timeout. And not everything’s going to wrap up nicely within 30 seconds.

1 Comment

Combining CSV Files in Power BI

Stephanie Bruno shows us a method for combining CSV or Excel files:

One of the unsung heroes to me in Power BI desktop (or Power Query in Excel) is how wonderfully simple it is to combine csv or Excel files from a folder. Maybe it’s not totally unsung, but I think it’s an everyday problem that many people have to deal with. It could easily be an entry point for many new users to see just how much easier Power Query can make their lives.

As much as I appreciate how easy the tool allows us to automatically combine files without having to write any code, I never like all those extra queries that get created in my file and so I prefer to simplify it by just writing one magical little line of code. But I always got frustrated because my one line of code left me without proper column headers, so my one line of code turned into a few extra cumbersome steps. In this post, I’ll show you how to slightly modify that one line of code to get the column headers and keep your queries clean and simple.

Click through for a demo.

Comments closed

Outlier Identification Using Spark 3.0

Tori Tompkins takes us through principles of anomaly detection in Apache Spark 3.0:

To calculate Median Absolute Deviation (MAD) you need to calculate the difference between the value and the median. In simpler terms, you will need to calculate the median of the entire dataset, the difference between each value and this median, then take another median of all the differences.

In Spark you can use a SQL expression ‘percentile()’ to calculate any medians or quartiles in a dataframe. ‘percentile()’ expects a column and an array of percentiles to calculate (for median we can provide ‘array(0.5)’ because we want the 50% value ie median) and will return an array of results.

Like standard deviation, to use MAD to identify the outliers it needs to be a certain number of MAD’s away. This number is also referred to as the threshold and is defaulted to 3.

Read on for three measures and their implementations in PySpark.

Comments closed

Features and Improvements in Spark 3.0

Manoj Pandey summarizes some of the improvements in Apache Spark 3.0:

With Spark 3.0 release (on June 2020) there are some major improvements over the previous releases, some of the main and exciting features for Spark SQL & Scala developers are AQE (Adaptive Query Execution), Dynamic Partition Pruning and other performance optimization and enhancements.

Below I’ve listed out these new features and enhancements all together in one page for better understanding and future reference.

Click through for the summary.

Comments closed

SSMS and Ignoring Certain Waits

Erik Darling has a plea to the SQL Server Management Studio team:

Lock waits are particularly annoying. Imagine (I know, this might be difficult) that you have a friend who is puzzled by why a query is sometimes slow.

They send you an actual plan for when it’s fast, and an actual plan for when it’s slow. You compare them in every which way, and everything except duration is identical.

It’d be a whole lot easier to answer them if LCK waits were collected, but hey. Let’s just make them jump through another hoop to figure out what’s going on.

CXCONSUMER has a similar problem — and here’s the thing — if people are going through the trouble of collecting this information, give’em what they ask for. Don’t just give them what you think is a good idea.

Click through to see the issue and what you can do to work around this limitation.

Comments closed