Automating Data Importation With dbatools

Bob Pusateri shows how to keep a dataset up to date using Powershell and dbatools:

Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.

It’s a nice post covering one more method of automating data loads without a major amount of orchestration.

Command Shell In SQL Agent Only Runs One Command

Kenneth Fisher makes an important discovery about SQL Agent jobs running in command shell mode:

Command shell steps in SQL Agent jobs only process the first command. (Yea, I know it’s the title of the post so this wasn’t much of a surprise. You’ll live.)

In order to prove this, I created a job with a series of steps, each one a different test. I’m using two very simple commands. whoami that tells you who is running the shell (very useful if you aren’t certain) and dir which I will assume everyone knows.

Kenneth ran through several tests to see what would trigger the “first command” rule and it turns out that pretty much everything does.

Row-By-Row Is Slow-By-Slow

Lukas Eder points out that row-by-row updates are a great way of slowing down your system:

The best way to find out is to benchmark. I’m doing two benchmarks for this:

  1. One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single bulk UPDATE)

  2. One that is run in Java, doing JDBC calls, showing the performance difference between different approaches available to Java (namely looping, caching PreparedStatement but still looping, batching, and a single bulk UPDATE)

The results tend to be even more dramatic on SQL Server, where the row-by-row overhead is even greater.

Async Processing With Azure Analysis Services

Teo Lachev notes that you can process Azure Analysis Services cubes without maintaining an HTTP connection:

AAS supports processing tasks asynchronously with REST APIs. The difference is that the service component (REST API) maintains the connectivity to the server – thus reducing the chances of HTTP disconnections from the external application. Microsoft has provided a RestAPISample console app to help you get started. As with any REST API invocation, you’d need to register the app in the Azure Portal so that you can authenticate successfully. Other than that, it’s simple to invoke the REST API and Microsoft has provided step-by-step instructions.

Another, although synchronous, option is to run a PowerShell script in the Azure Cloud Shell environment. You can upload the script as a file. The script can ask you to provide credentials interactively (Get-Credentials method) or you can hardcode the credentials. Here is an example of a PowerShell script that processes a specific table.

Click through to check out how to do this.

Azure Data Factory v2 And Decompression

Ben Jarvis notes a file naming bug with Azure Data Factory v2 when decompressing files:

ADF V2 natively supports decompression of files as documented at With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases.

In our particular case the file names and extensions of the source files are all uppercase and when ADF uploads them it doesn’t alter the file extension e.g. if I upload 1234_567.CSV.GZ I get 1234_567.CSV.GZ in blob storage rather than 1234_567.CSV.

Click through for more details and be sure to vote on his Azure Feedback bug if this affects you.

Five Books For Learning Kafka

Data Flair has a guide to five books to help you learn Apache Kafka:

The book “Kafka: The Definitive Guide” is written by engineers from Confluent andLinkedIn who are responsible for developing Kafka.

They explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. It contains detailed examples as well. Through this including the replication protocol, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, the controller, and the storage layer. However, even if you are new to Apache Kafka as the application architect, developer, or production engineer, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.

I haven’t read any of them yet, but a couple look interesting enough to add to my to-read list.

Query Store Cleanup Can Be Blocked

Kendra Little shows that you can block Query Store cleanup:

This is an isolated test system, so I went to clean out Query Store as a reset. I didn’t need any of the old information in there, so I ran:

  • GO

I was surprised when I didn’t see this complete very quickly, as it normally does.

Click through to see how Kendra diagnoses the issue.

The Non-Blocking Segment Operator

Hugo Kornelius notes a documentation bug with the Segment operator:

The Segment operator, like all operators, is described at the Books Online page mentioned above. Here is the description, quoted verbatim:

Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These columns are shown as arguments in the Segment operator. The operator then outputs one segment at a time.

Looking at the properties of the Segment operator, we do indeed see the argument mentioned in this description, in the Group By property (highlighted in the screenshot). So this operator reads the data returned by the Index Scan (sorted by TerritoryID, which is required for it to work; this is why the Index Scan operator is ordered to perform an ordered scan), and divides it into segments based on this column. In other words, this operator is a direct implementation of the PARTITION BY spefication. Every segment returned by the operator is what we would call a partition for the ROW_NUMBER() function in the T-SQL query. And this enables the Sequence Project operator to reset its counters and start at 1 for every new segment / partition.

Read on to understand the issue and see Hugo’s proof.

Type Information Change In Export-CSV Cmdlet

Max Trinidad notes that a default parameter in the Export-Csv cmdlet has flipped between Powershell on Windows and Powershell Core 6:

For a long time, in Windows PowerShell, we had to add the parameter “-NoTypeInformation“, so the “#TYPE …” line on the first row of the *CSV would not be included.

So, in Windows PowerShell executing the command without the “-NoTypeInformation” parameter, will produce the following result:

Now, using the same command in PowerShell Core without the “-NoTypeInformation” parameter, will produce a different result:

This is a better default, but I think it’s going to burn some people who have scripts pre-built expecting to clear out that first line.

Reading SQL Server Error Logs

Jana Sattainathan has a procedure which makes it easier to read the error logs in SQL Server:

xp_ReadErrorLog has some limitations

  • Reads only the specified error log whose ArchiveNumber is specified
  • Shows only the rows with matching string (not adjacent context info rows)

The first bullet is obvious in that we cannot read ALL the logs to look for something more holistic and meaningful with all the information we have. This MSSQLTips post does describe a method to loop through.

Click through for Jana’s stored procedure code.


April 2018
« Mar