Press "Enter" to skip to content

Category: Data Loading

Flat File Importation via Azure Data Studio

Josephine Bush needs to import a file:

Initially, I thought I would have to use sqlcmd because I’m on a Mac and don’t have SSMS. It turns out Azure Data Studio has a nifty way to import data from flat files – yay!

I’ve used this extension a few times in the past on Linux and Windows and it’s pretty good, especially if you have a fairly straightforward flat file. If it’s a messy file, you’ll still get inscrutable errors. And, as far as data sources go, GIGO.

Leave a Comment

Route Planning in Postgres

Mark Litwintschik plans a journey:

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

In this post, I’ll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Read on for the process.

Comments closed

Row-Level Security and Data Migration

Forrest McDaniel shares an interesting case of using row-level security:

This was the situation I found myself in earlier this year – our company had absorbed another, and it was time to slurp up their tables. There were a lot of decisions to make and tradeoffs to weigh, and we ended up choosing to trickle-insert their data, but make it invisible to normal use until the moment of cutover.

The way we implemented this was with Row Level Security. Using an appropriate predicate, we could make sure ETL processes only saw migrated data, apps saw unmigrated data, and admins saw everything. To give a spoiler: it worked, but there were issues.

I would not have thought of this scenario. And given the difficulties Forrest & crew ran into, it might be for the best…

Comments closed


Chad Callihan embraces the power of AND:

Data can be inserted into one temp table from another a couple of ways. There is the INSERT INTO option and the SELECT INTO option.

Are you devoted to one option over the other? Maybe you’re used to one and never experimented with the other. Let’s test each and compare performance to find out which is more efficient.

Both of these are useful, though Chad does mention a performance improvement with SELECT INTO. I tend to prefer INSERT INTO for “structured” scenarios because it lets me define the shape of the output table. When I don’t care what the shape is—for example, when I just need some data one time to perform an analysis—then I prefer SELECT INTO for its simplicity.

Comments closed

Quick Insertion into SQL Server from a Spreadsheet

Kevin Wilkie gives a quick way to load data from Excel (or any other spreadsheet):

One of the items I do before creating the table in the database is to review all of the data that is in the spreadsheet to make sure that:

1. I understand the data that is going into the database table.
2. Nothing that is just obviously wrong is trying to be pushed into the database. For example, the data I was talking about earlier that was one column over from what it should have been. If you see data that is all 0’s and 1’s up until a certain row, then you have descriptions or names – you probably have some bad data.

The other important part of pushing the data into the database from a spreadsheet is working with the CONCATENATE function of Excel. Let’s go into that now.

Click through for the process, as well as additional explanation.

Comments closed

Incremental Refresh of Any Power BI Data Source

Gilbert Quevauvilliers wants incrementally to refresh all the sources:

The pattern that I am talking about is the following below, which will be used as my example below.

– Connect to a data source which can query fold.
> In my example I have installed and configured an Azure SQL Serverless DB
> In this database I have a date table.
– Configure date table to use Incremental refreshing as per the blog post
> Incremental refresh in Power BI
– Create a function which will then use the Date value as part of the parameter
> In my example I have got CSV files which have Exchange rate information from Azure Blob Storage.
> The file names of the CSV files is the date.
– Invoke the Function within the date table to extract the required information.

I know what you might be thinking, that as soon as I add in the column with the function it breaks the Query Folding. That is what I thought too.

The great news is that Incremental refreshing DOES STILL WORK!

Read on for the demonstration.

Comments closed

Bulk Insertion With An Identity Column

Kenneth Fisher gives us a quick post on bulk insertion against tables with identity columns:

TL;DR; BULK INSERT doesn’t have an easy way to specify a column list so the columns in the insert file must match the columns in the table unless you use a format file or a staging table.
As simple as they appear on the surface identity columns are fairly complicated things. BULK INSERT, on the other hand, is pretty complicated even on the surface (at least as far as I’m concerned). Because of this, the combination can get even worse. When you do an insert into a table that has an identity column you usually just ignore that column and it’s automatically filled in. However, that’s not true with BULK INSERT. Unless you use a format file, the number of columns (and their order) in the file has to match the table.

Read the whole thing.

Comments closed

Using The Spark Connector To Speed Up Data Loads

Denzil Riberio explains how you can use the Spark connector for Azure SQL DB and SQL Server to speed up inserting data from Spark into SQL Server 15x over the native JDBC client:

Since the load was taking longer than expected, we examined the sys.dm_exec_requests DMV while load was running, and saw that there was a fair amount of latch contention on various pages, which wouldn’t not be expected if data was being loaded via a bulk API.

Examining the statements being executed, we saw that the JDBC driver uses sp_prepare followed by sp_execute for each inserted row; therefore, the operation is not a bulk insert. One can further example the Spark JDBC connector source code, it builds a batch consisting of singleton insert statements, and then executes the batch via the prep/exec model.

It’s the power of bulk insertion.

Comments closed

Automatic Retry With Optimistic Concurrency

Vladimir Khorikov explains an anti-pattern when dealing with a model using optimistic concurrency (for example, memory-optimized tables):

Alright, back to the original question. So, how to combine optimistic locking and automatic retry? In other words, when the application gets an error from the database saying that the versions of a Product don’t match, how to retry the same operation again?

The short answer is: nohow. You don’t want to do that because it defeats the very purpose of having an optimistic lock in the first place.

Remember that the locking mechanism is a way to ensure that all changes are taken into consideration when changing a record in the database. In other words, someone should review the new version of the record and make an informed decision as to whether they still want to submit the update. And that should be the same client who originated the initial request, you can’t make that decision for them.

Plenty of systems do this sort of data merging automatically, but I get Vladimir’s point:  if someone else pulled the rug out from under you, it might change your decision on what that data should look like.

Comments closed