Press "Enter" to skip to content

Author: Kevin Feasel

Cancelling Power Query Refreshes

Imke Feldmann shares how you can stop a data refresh in Power Query without losing the work you’ve done in the designer:

If you’re working with large data or complex queries that take a long time refresh, cancelling one of those refreshes can even take longer time, especially, if the query has run for quite some time already.

Luckily, there is an easy trick to cancel refresh without loosing the work you’ve done already

Read on to see how.

Comments closed

Linked Servers and Remote Insertion

Max Vernon recommends a pull rather than push model when you need to insert cross-server:

Linked Servers offer a great way to connect two SQL Servers together, allowing remote querying and DML operations. Frequently, this is used to copy data from production to reporting. However, the temptation is to run the copy operation on the production, or source side. If you do that, even with a single INSERT INTO statement, SQL Server will process each individual row as a discrete INSERT INTO statement via a cursor operation. This makes for very slow inserts across a linked server. Running the operation from the destination server means SQL Server can simply query the remote source table for all the rows, inserting them as a set into the destination table. The difference in speed can be eye-watering.

Click through for a slightly creepy picture and a less creepy example.

Comments closed

Spark UDFs in Scala

Achilleus shows us how to create a user-defined function for Spark in Scala, as well as the performance drawbacks:

It is pretty straight forward and easy to create it in spark. Let’s say we have this customer data from Central Perk. If you look at the country data, it has a lot of discrepancies but we kinda know its the right country, it’s just that the way it is entered is not typical. Let’s say we need to normalize it to the USA that is similar with the help of a known dictionary.

The performance hit is often too much for me to accept, though that could just be that I write bad functions.

Comments closed

ETL with Spark and Hive

Emrah Mete gives us an example of using Apache Spark for ETL into Apache Hive:

Now let’s go to the construction of the sample application. In the example, we will first send the data from our Linux file system to the data storage unit of the Hadoop ecosystem (HDFS) (for example, Extraction). Then we will read the data we have written here with Spark and then we will apply a simple Transformation and write to Hive (Load). Hive is a substructure that allows us to query the data in the hadoop ecosystem, which is stored in this environment. With this infrastructure, we can easily query the data in our big data environment using SQL language.

Most of the things relational database professionals do are pretty much the same things that you do with Spark and Hive. There are differences in implementation and level of programming familiarity, but they’re pretty similar.

Comments closed

Choosing Between Management Studio and Azure Data Studio

Brent Ozar gives us the lay of the tooling land:

Every time there’s a new release of SQL Server or SQL Server Management Studio, you can grab the latest version of SSMS and keep right on keepin’ on. Your job still functions the same way using the same tool, and the tool keeps getting better.

And it’s free. You don’t have to ask the boss for upgrade money. You can just download it, install it, and take advantage of things like the cool new execution plan est-vs-actual numbers (which also cause presenters all over to curse, knowing that they have to redo a bunch of screenshots.)

I spend a lot of time jumping back & forth between SQL Server and Postgres, and lemme just tell you, the tooling options on the other side of the fence are a hot mess.

Yeah, Management Studio is the best of the bunch. I’m using Azure Data Studio more at home but still need a couple of plugins to use it often at work. And those two beat pretty much every other tool I’ve ever worked with.

Comments closed

Scripting with Variables in Bash

Kellyn Pot’vin-Gorman shows how easy it can be to write Bash scripts with variables:

Let’s start with a use case of deploying a Azure database. When a customer is making the decision to build it out, there are specific information needed to deploy and this will continue to change as the Azure catalog is updated with new offerings. For our example, we’ll stick to a very small snippet of code, as the values we dynamically create will be reused throughout the script. This example will skip past the actual server creation, etc. and just focus on the user database creation. The Server, zone and subscription are all set in the default steps earlier on so as not to have to repeat it throughout each resource deployment step.

There’s a lot to Bash and its programming guide is a lot of sheets of paper (ask me how I know), but this is one of those places where you can get a nice benefit easily.

Comments closed

Using dbatools for Inventory Analysis

Andreas Schubert gives us a way to learn more about our SQL Server inventories with dbatools:

With the multitude of environments that I am operating, it’s impossible to remember every server, every database or the multiple different ways they are interacting with each other. Therefore, one of the first things I do when taking over a consulting engagement is mapping out all those different bits of information.

Since the environments usually change pretty fast, my goal is to automate this process as much as possible.

In this series of posts, I will try to show you how I am implementing this. Of course, your requirements or implementations may differ, but hopefully this blog post can give you some ideas about your tasks too.

Click through for a script. There are also some good comments.

Comments closed

READPAST In Action

Erik Darling shows how READPAST is no panacea:

Locking hints can be really handy in these situations, especially the READPAST hint. The documentation for it says that it allows you to skip over row level locks (that means you can’t skip over page or object level locks).

What it leaves out is that your READPAST query may also need to try to take row level shared locks.

Read on for an example as well as an alternative which ends up being better in this case.

Comments closed