Press "Enter" to skip to content

Day: August 19, 2022

Bulk Insert into Azure SQL DB using Python

Jose Manuel Jurado Diaz shares some customer notes:

Today, I’ve been working on a service request that our customer wants to improve the performance of a bulk insert process. Following, I would like to share my experience working on that.

Our customer mentioned that inserting data (100.000 rows) is taking 14 seconds in a database in Business Critical. I was able to reproduce this time using a single thread using a table with 20 columns.

A lot of this advice also applies to on-premises SQL Server and relates to using bulk inserts and picking good batch sizes. Similar advice to what we’d be doing with SQL Server Integration Services or any other ETL/ELT process, tailored to Python.

Comments closed

Paved a Repo and Put up a Parking Lot

Robert Harris warns against the desire of starting it all over:

We’re programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand…It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.

JOEL SPOLSKY IN THINGS YOU SHOULD NEVER DO, PART 1

There is a fleeting moment in every software project when it is absolutely perfect. It is the time between clicking “New” and “Save” in your code editor. In that brief interval, limitless potential and beauty. In every moment that follows, compromise and doubt (but working software, too!).

There are a few threads to unravel here.

First, Chesterton’s fence: if you don’t know why a thing is there, you are probably not the right person to decide to remove it. If you understand why the code is there and exactly what it is doing, then you become qualified to decide what, if anything, needs to be changed.

Second, ego: I’m a great developer. The best developer I know. Heck, maybe the best developer in the world. Therefore, if I don’t immediately understand code, it must be because that code is bad. Most of us don’t think explicitly in these terms but we still end up in the conclusion of, “if I don’t immediately understand the code, it is bad.” Or even worse, “If the code does not work exactly the way I would have it work, it is bad.”

Third, unstated/misunderstood business requirements. Code often starts to get nasty because the business requirements changed on the original designers or there was a process of business evolution. If business requirements are still evolving, what makes you think you’re going to write code that won’t be just as ugly? If business requirements are not still evolving and you really understand the code, you have a chance. But that leads me to the next bit.

Fourth, the value of reformation. Refactoring is a common path for code reformation. Having lots of tests increases the safety net we have for reformation, as those tests are likely to catch some of the dumb mistakes we make and hopefully suss out some of the worst things.

Fifth, Javascript is a hole of pain.

Comments closed

Testing Powershell Scripts

David Wilson provides an introduction to Pester:

Most of you probably know that I’m a big fan of automated testing and especially testing during the development process. It significantly improves the design of the code by encouraging loose coupling and high cohesion. It also provides great documentation and increases the confidence of anyone who needs to change the code in the future (this includes future you)!

Testing does tend to get the short end of the stick when it comes to development time. Some of that is design problems, like David mentions, but I think a lot of it is the “This is a solved problem” mentality we (and I am definitely part of “we” here) end up in: I proved that the solution work because the code compiled and the two scenarios I tried out worked; therefore, why do I need to “waste” the extra time by writing all of these tests when I can move on to something more interesting?

Comments closed

Views: Indexed or Otherwise

Erik Darling explains an important difference:

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

And naturally, those indexed views are different from materialized views in Oracle/PostgreSQL but that’s a topic for another day.

Comments closed