Press "Enter" to skip to content

Category: Data Loading

Lakehouse Table Partitioning in Microsoft Fabric

Gilbert Quevauvilliers performs a split:

When loading data, it is always important to load the data with performance and scalability in mind.

For lakehouse tables to return queries quickly and to scale it is essential to load your lakehouse tables with partitions.

What I am going to show you in my blog post today is how to load data into a Lakehouse table where the table will be automatically partitioned by Year/Month/Day.

Click through for the example.

Comments closed

Microsoft Fabric Lakehouse Ingesting CSV vs SQL

Reitse Eskens performs a comparison:

This blog will be a quite short one compared to the other blogs as it’s more of an overview to show you the capacity of Fabric ingesting CSV files in their native format into a Lakehouse and ingesting SQL data into a table structure inside the Lakehouse. Simple, straightforward stuff without any form of modification. You could call it bronze, raw, ingestion, temp or whatever your preferred naming convention is.

Why is this important? Well, we still have source systems that can only output to files. Just as we still have customers running on SQL Server 2000, legacy or even antique systems are still running. And it’s important to know how much capacity you use when just ingesting data without any modification.

Read on for the two scenarios, giving you an idea of which one is faster. I’d be interested in a third option, which is reading from Parquet files. My initial expectation would be that it would be even faster and more efficient, depending on the structure of the data.

Comments closed

Data Loading with BCP

Peter Schott describes a recent bit of messiness:

However, at the time this popped up, my most recent “ticket” was a separate request. I’d been chatting with a client who had mentioned that they were closing an account for one of the SaaS apps they use. The vendor would provide DDL and extract files for import into their own system, but only after the account was closed. We chatted back and forth about some ideas for them to load the data into their own Azure SQL DB instance. At one point, he asked if I’d want to just do it for a small consulting fee. We chatted a bit more and he realized that he really didn’t want to do it.

Read on for the rest of the story. BCP is powerful but always felt finicky to me. Either that or I wasn’t very good at using it. Either could be the case.

Comments closed

Flat File Importation via Azure Data Studio

Josephine Bush needs to import a file:

Initially, I thought I would have to use sqlcmd because I’m on a Mac and don’t have SSMS. It turns out Azure Data Studio has a nifty way to import data from flat files – yay!

I’ve used this extension a few times in the past on Linux and Windows and it’s pretty good, especially if you have a fairly straightforward flat file. If it’s a messy file, you’ll still get inscrutable errors. And, as far as data sources go, GIGO.

Comments closed

Route Planning in Postgres

Mark Litwintschik plans a journey:

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

In this post, I’ll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Read on for the process.

Comments closed

Row-Level Security and Data Migration

Forrest McDaniel shares an interesting case of using row-level security:

This was the situation I found myself in earlier this year – our company had absorbed another, and it was time to slurp up their tables. There were a lot of decisions to make and tradeoffs to weigh, and we ended up choosing to trickle-insert their data, but make it invisible to normal use until the moment of cutover.

The way we implemented this was with Row Level Security. Using an appropriate predicate, we could make sure ETL processes only saw migrated data, apps saw unmigrated data, and admins saw everything. To give a spoiler: it worked, but there were issues.

I would not have thought of this scenario. And given the difficulties Forrest & crew ran into, it might be for the best…

Comments closed

Contrasting INSERT INTO and SELECT INTO

Chad Callihan embraces the power of AND:

Data can be inserted into one temp table from another a couple of ways. There is the INSERT INTO option and the SELECT INTO option.

Are you devoted to one option over the other? Maybe you’re used to one and never experimented with the other. Let’s test each and compare performance to find out which is more efficient.

Both of these are useful, though Chad does mention a performance improvement with SELECT INTO. I tend to prefer INSERT INTO for “structured” scenarios because it lets me define the shape of the output table. When I don’t care what the shape is—for example, when I just need some data one time to perform an analysis—then I prefer SELECT INTO for its simplicity.

Comments closed

Quick Insertion into SQL Server from a Spreadsheet

Kevin Wilkie gives a quick way to load data from Excel (or any other spreadsheet):

One of the items I do before creating the table in the database is to review all of the data that is in the spreadsheet to make sure that:

1. I understand the data that is going into the database table.
2. Nothing that is just obviously wrong is trying to be pushed into the database. For example, the data I was talking about earlier that was one column over from what it should have been. If you see data that is all 0’s and 1’s up until a certain row, then you have descriptions or names – you probably have some bad data.

The other important part of pushing the data into the database from a spreadsheet is working with the CONCATENATE function of Excel. Let’s go into that now.

Click through for the process, as well as additional explanation.

Comments closed