Press "Enter" to skip to content

Curated SQL Posts

Automatic Seeding of Availability Groups

Jamie Wick takes us through some considerations when trying out automatic seeding of availability groups:

In SQL Server 2016 Microsoft introduced Automatic Seeding for Availability Groups (AG). The Automatic Seeding process streams the database files directly to the secondary server(s) using the database mirroring endpoints, removing the need to restore the databases, before joining them to an AG.

Read on for Jamie’s thoughts and notes.

Comments closed

Testing a Linked Server via T-SQL

Kenneth Fisher has evil afoot:

Unfortunately, if there is a problem with the server / server name the sp_addlinkedserver doesn’t seem to throw an error and sp_addlinkedsrvlogin throws an error that breaks me out of the try block. So after my Google-fu failed me I did what I frequently do and asked on #sqlhelp. And I was given an answer that led me to the solution.

Click through for the solution. But not for the answer as to why he’s growing batches of monsters linked servers.

Comments closed

Updating a Power BI Refresh Schedule

Martin Schoombee continues a series on automating Power BI deployments:

There’s a few things you need to pay close attention to when setting the refresh schedule via the API:

– Unless you’re setting the refresh schedule for a Premium workspace, you can only refresh a dataset up to 8 times a day. We’re only going to set it to update once a day here, but keep this in mind if you’re planning to adjust the API call to refresh multiple times a day.

– The name of the time zone you provide has to match exactly with the names (middle column) in this reference: Microsoft Time Zone Index

– The refresh time has to be in the format hh:mm, and similar to the options in the Power BI portal you can only refresh on the hour or half-hour.

Read on to see how it works and the API call to make.

Comments closed

Setting the Default Command Timeout with Microsoft.Data.SqlClient

Erik Ejlskov Jensen shows us a way to set a default command timeout in .NET’s Microsoft.Data.SqlClient:

With the latest 2.1.0 preview 2 release of the open source .NET client driver for Microsoft SQL Server and Azure SQL Database, Microsoft.Data.SqlClient, it is now possible to set the default command timeout via the connection string.

Now you can work around timeout issues simply by changing the connection string, where this previously required changes to code, and maybe changes to code you did not have the ability to change.

This is pretty nice, as my recollection was that you could set connection timeout via connection string, but not command timeout. And not everything’s going to wrap up nicely within 30 seconds.

1 Comment

Combining CSV Files in Power BI

Stephanie Bruno shows us a method for combining CSV or Excel files:

One of the unsung heroes to me in Power BI desktop (or Power Query in Excel) is how wonderfully simple it is to combine csv or Excel files from a folder. Maybe it’s not totally unsung, but I think it’s an everyday problem that many people have to deal with. It could easily be an entry point for many new users to see just how much easier Power Query can make their lives.

As much as I appreciate how easy the tool allows us to automatically combine files without having to write any code, I never like all those extra queries that get created in my file and so I prefer to simplify it by just writing one magical little line of code. But I always got frustrated because my one line of code left me without proper column headers, so my one line of code turned into a few extra cumbersome steps. In this post, I’ll show you how to slightly modify that one line of code to get the column headers and keep your queries clean and simple.

Click through for a demo.

Comments closed

Outlier Identification Using Spark 3.0

Tori Tompkins takes us through principles of anomaly detection in Apache Spark 3.0:

To calculate Median Absolute Deviation (MAD) you need to calculate the difference between the value and the median. In simpler terms, you will need to calculate the median of the entire dataset, the difference between each value and this median, then take another median of all the differences.

In Spark you can use a SQL expression ‘percentile()’ to calculate any medians or quartiles in a dataframe. ‘percentile()’ expects a column and an array of percentiles to calculate (for median we can provide ‘array(0.5)’ because we want the 50% value ie median) and will return an array of results.

Like standard deviation, to use MAD to identify the outliers it needs to be a certain number of MAD’s away. This number is also referred to as the threshold and is defaulted to 3.

Read on for three measures and their implementations in PySpark.

Comments closed

Features and Improvements in Spark 3.0

Manoj Pandey summarizes some of the improvements in Apache Spark 3.0:

With Spark 3.0 release (on June 2020) there are some major improvements over the previous releases, some of the main and exciting features for Spark SQL & Scala developers are AQE (Adaptive Query Execution), Dynamic Partition Pruning and other performance optimization and enhancements.

Below I’ve listed out these new features and enhancements all together in one page for better understanding and future reference.

Click through for the summary.

Comments closed