Data Loading – Curated SQL

Static and Dynamic Bulk Insert into SQL Server

Published 2025-10-09 by Kevin Feasel

There are numerous use cases for multi-file imports of CSV files into a SQL Server table:

Dynamic SQL Server bulk insert loads are especially appropriate for tasks that extract content from multiple files to a SQL Server table where the source file names change between successive import jobs.

Static bulk insert loads target scenarios where the source file names do not change between successive import jobs.

Read on for examples of how to implement each. Admittedly, bulk insert has rarely worked all that well in my experience, whether due to permissions mishaps, poor data integrity, or sudden changes in data types between file runs. But it does tend to work a lot better if you have a specified data interchange format and a standardized process to prepare the data and make it available on disk for insertion.

Loading CSV Files via dbatools

Published 2025-07-23 by Kevin Feasel

David Seis loads some data:

Most businesses are rotten with Excel sheets and CSV exports from various tools and as my mentor puts it “Excel is the world’s database”. The dbatools command Import-DbaCsv enables the quick load of CSV tables into SQL Server, which then opens up the world of fast transformation and use in other tools such as PowerBI, or even just having it queryable rather than just in a file somewhere.

Import-DbaCsv is for you if any part of your job is to manually manage CSV files and you want to reduce the amount of time that you take to do that processing. Even if you are not a DBA but are interested in the data field, you can use SQL Server express for free, import and transform data using PowerShell for free, and export that data into Power BI or any other tool for free. I promise it can make your manual processes better, faster, and more resilient and save you tons of time!

Read on for a review of the quite useful cmdlet.

Comments closed

Incremental Data Load into Parquet Files from Python

Published 2025-07-18 by Kevin Feasel

Lee Asher loads some data:

Parquet is a column-oriented open-source storage format increasingly used for “big data” analytics. Yet despite its growing popularity as a native format for data lakes and data warehouses, tools for maintaining these environments remain scarce. Getting data from a SQL environment into Parquet isn’t difficult – but how do we maintain that data over time, keeping it current? In other words, if we already have an existing Parquet file, how can we efficiently append new data to it?

In this article, we’ll introduce the Parquet format, explain some strategies for incrementally updating a Parquet repository, and, with a simple Python script, implement a nightly-feed update process.

Not listed in here is one word that I expected: Delta. Because that’s how we normally do incremental data modification in Parquet data. Either that or Apache Iceberg. Lee shows us a different route that can work.

Comments closed

Ingesting Logs into Microsoft Fabric Real-Time Intelligence via Logstash

Published 2025-07-15 by Kevin Feasel

Surya Teja Josyula and Ramachandran G. use one part of the ELK stack:

Logstash is an open-source data processing tool that enables the collection, transformation, and forwarding of data from a wide variety of sources. It acts as a data pipeline engine, helping organizations manage and streamline the flow of structured and unstructured data across systems.

Whether you’re managing infrastructure logs, application events, or telemetry data, this guide will walk you through setting up a seamless pipeline that bridges raw log data with real-time analytics in Fabric.

Click through for the process.

Comments closed

Loading Data into Snowflake via Python

Published 2025-06-13 by Kevin Feasel

Anil Kumar Moka does a bit of data loading:

In our ongoing exploration of Snowflake data loading strategies, we’ve previously examined how to use pandas with SQLAlchemy to efficiently move data into Snowflake tables. That approach leverages pandas’ intuitive DataFrame handling and works well for many common scenarios where you’re already manipulating data in Python before loading it to Snowflake.

In this article, we’re diving deeper into the Snowflake toolbox by exploring the native Snowflake Connector for Python. While pandas offers simplicity and familiarity, the native connector provides a different set of capabilities focused on precision control and Snowflake-specific optimizations. This article explains you when and how to use this more direct approach for everything from small CSV files to massive datasets that would overwhelm pandas.

Click through for the full article.

Comments closed

Comparing Data Importation Modes in Fabric Semantic Models

Published 2025-05-15 by Kevin Feasel

Marco Russo has a guide:

When I presented “Choosing Between Import Mode, Direct Lake, and Composite Models” at Fabric Conf 2025 in Las Vegas, the room overflowed, and the session was not recorded. I promised to publish the material once the new Direct Lake + Import composite model became available. This post follows the structure of that (now re‑recorded) session.

I prepared a recap for this blog post, but I suggest you watch the full video!

Check out the video and Marco’s guidance.

Comments closed

Lakehouse Table Partitioning in Microsoft Fabric

Published 2024-07-03 by Kevin Feasel

Gilbert Quevauvilliers performs a split:

When loading data, it is always important to load the data with performance and scalability in mind.

For lakehouse tables to return queries quickly and to scale it is essential to load your lakehouse tables with partitions.

What I am going to show you in my blog post today is how to load data into a Lakehouse table where the table will be automatically partitioned by Year/Month/Day.

Click through for the example.

Comments closed

Microsoft Fabric Lakehouse Ingesting CSV vs SQL

Published 2024-07-02 by Kevin Feasel

Reitse Eskens performs a comparison:

This blog will be a quite short one compared to the other blogs as it’s more of an overview to show you the capacity of Fabric ingesting CSV files in their native format into a Lakehouse and ingesting SQL data into a table structure inside the Lakehouse. Simple, straightforward stuff without any form of modification. You could call it bronze, raw, ingestion, temp or whatever your preferred naming convention is.

Why is this important? Well, we still have source systems that can only output to files. Just as we still have customers running on SQL Server 2000, legacy or even antique systems are still running. And it’s important to know how much capacity you use when just ingesting data without any modification.

Read on for the two scenarios, giving you an idea of which one is faster. I’d be interested in a third option, which is reading from Parquet files. My initial expectation would be that it would be even faster and more efficient, depending on the structure of the data.

Comments closed

bcp and UNIX-Style Line Endings

Published 2024-05-10 by Kevin Feasel

Vlad Drumea troubleshoots a problem:

In this blog post I cover a fix for a weird behavior that bcp has when trying to import files that contain Unix line endings.

Read on for a description of what Vlad was trying to do, the problems you can run into when using bcp, how to fix those problems, and a bonus on how DBAtools is pretty neat.

Comments closed

Data Loading with BCP

Published 2024-02-15 by Kevin Feasel

Peter Schott describes a recent bit of messiness:

However, at the time this popped up, my most recent “ticket” was a separate request. I’d been chatting with a client who had mentioned that they were closing an account for one of the SaaS apps they use. The vendor would provide DDL and extract files for import into their own system, but only after the account was closed. We chatted back and forth about some ideas for them to load the data into their own Azure SQL DB instance. At one point, he asked if I’d want to just do it for a small consulting fee. We chatted a bit more and he realized that he really didn’t want to do it.

Read on for the rest of the story. BCP is powerful but always felt finicky to me. Either that or I wasn’t very good at using it. Either could be the case.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Data Loading