Bulk Load Tools

Kevin Feasel

2016-10-21

ETL

Erland Sommarskog has a brand new essay:

The bulk-load tools have been in the product for a long time and they are showing their age. When they work for you, they are powerful. But you need to understand that these tools are binary to their heart, and they have no built-in rule that says that each line a file is a record – they don’t even think in lines. You also need to understand that there are file formats they are not able to handle.

I have tried to arrange the material in this article so that if you have a simple problem, you only need to read the first two chapters after the introduction. I first introduce you them to their mindset, which is likely to be different from yours. Next I cover the basic options to use for every-day work. If you have a more complex file, you will need to use a format file and the next three chapters are for you. I first describe how format files work as such, and the next two chapters show how to use format files for common cases for import and export respectively. This is followed by a chapter about Unicode files, including files encoded in UTF‑8. Then comes a chapter about “advanced” options, including how to load explicit values into an IDENTITY column. A short chapter covers permissions. The last chapter discusses XML format files, and I am not sorry at all if you give this chapter a blind eye – I find XML format files to be of dubious value.

I haven’t had a chance to read this yet, but because I have never had good luck with bcp and BULK INSERT, it’s on my to-read list.

Related Posts

When Data Factory Flows Don’t

Emma Stewart points out an issue that might vex newcomers to Azure Data Factory: The data within the Data Lake store was organised into a Year and Month hierarchy for the folders, and each days transactions were stored in a file which was named after the day within the relevant month folder. The task then […]

Read More

An Apache Sqoop Tutorial

Kevin Feasel

2017-11-22

ETL, Hadoop

Subham Sinha has an introductory-level tutorial on Apache Sqoop: For Hadoop developer, the actual game starts after the data is being loaded in HDFS. They play around this data in order to gain various insights hidden in the data stored in HDFS. So, for this analysis the data residing in the relational database management systems […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31