Press "Enter" to skip to content

Month: August 2022

Paved a Repo and Put up a Parking Lot

Robert Harris warns against the desire of starting it all over:

We’re programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand…It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.


There is a fleeting moment in every software project when it is absolutely perfect. It is the time between clicking “New” and “Save” in your code editor. In that brief interval, limitless potential and beauty. In every moment that follows, compromise and doubt (but working software, too!).

There are a few threads to unravel here.

First, Chesterton’s fence: if you don’t know why a thing is there, you are probably not the right person to decide to remove it. If you understand why the code is there and exactly what it is doing, then you become qualified to decide what, if anything, needs to be changed.

Second, ego: I’m a great developer. The best developer I know. Heck, maybe the best developer in the world. Therefore, if I don’t immediately understand code, it must be because that code is bad. Most of us don’t think explicitly in these terms but we still end up in the conclusion of, “if I don’t immediately understand the code, it is bad.” Or even worse, “If the code does not work exactly the way I would have it work, it is bad.”

Third, unstated/misunderstood business requirements. Code often starts to get nasty because the business requirements changed on the original designers or there was a process of business evolution. If business requirements are still evolving, what makes you think you’re going to write code that won’t be just as ugly? If business requirements are not still evolving and you really understand the code, you have a chance. But that leads me to the next bit.

Fourth, the value of reformation. Refactoring is a common path for code reformation. Having lots of tests increases the safety net we have for reformation, as those tests are likely to catch some of the dumb mistakes we make and hopefully suss out some of the worst things.

Fifth, Javascript is a hole of pain.

Comments closed

Testing Powershell Scripts

David Wilson provides an introduction to Pester:

Most of you probably know that I’m a big fan of automated testing and especially testing during the development process. It significantly improves the design of the code by encouraging loose coupling and high cohesion. It also provides great documentation and increases the confidence of anyone who needs to change the code in the future (this includes future you)!

Testing does tend to get the short end of the stick when it comes to development time. Some of that is design problems, like David mentions, but I think a lot of it is the “This is a solved problem” mentality we (and I am definitely part of “we” here) end up in: I proved that the solution work because the code compiled and the two scenarios I tried out worked; therefore, why do I need to “waste” the extra time by writing all of these tests when I can move on to something more interesting?

Comments closed

Views: Indexed or Otherwise

Erik Darling explains an important difference:

When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.

SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.

And naturally, those indexed views are different from materialized views in Oracle/PostgreSQL but that’s a topic for another day.

Comments closed

Useful Design Patterns for Apache Spark Projects

Alexander Eleseev applies some design patterns:

When I participated in a big data project, I needed to program Spark applications to move and transform data from/to relational and distributed databases, like Apache Hive. I found such applications to have a number of pitfalls, so all “hard to read code,” “method is too large to fit into a single screen,” etc. problems need to be avoided for us to focus on deeper issues. Also, Spark jobs are similar: data is loaded from a single or multiple databases, gets transformed, then saved to a single or multiple databases. So it seems reasonable to try to use GoF patterns to program Spark applications. 

Specifically, this covers Spark code written in Java (or Python). I’d argue that Scala-based code would profit by following a different set of functional patterns rather than Gang of Four object-oriented design patterns.

Comments closed

A Primer on Azure Arc-Enabled Data Services

Warwick Rudd has a four-parter on Azure Arc-Enabled Data Services. Part 1 sets the stage:

Utilising Azure Arc-enabled data services provides you the ability to take advantage of the Azure data services (SQL Server, Azure SQL Managed Instance, PostgreSQL) in a hybrid environment. This offering provides you with reduced administrative efforts in managing and maintaining your data services while giving you the same look and feel as if you were running in the Azure Cloud.

Part 2 looks at the Data Controller:

The Azure Arc Data Controller is a Kubernetes operator that performs all of the orchestration to ensure you achieve your desired state. This is the main component in the Azure Arc infrastructure that links the data services with the Arc-enabled hardware located either in your On-premises, Azure, or any other public cloud data center and your azure subscription.

The Arc data controller allows you to deploy, manage, secure, and monitor your deployed data services estate using Azure Data Studio or the Azure Portal (only for directly connected mode deployments) but giving you the same experience as if you were managing your data services from inside of the Azure Portal.

Part 3 deploys a Data Controller:

As previously mentioned there are 2 types of deployment available for your Arc Data Controller. In this post, we are going to have a look at deploying in the Arc Data Controller using the directly connected mode.

For a directly connected Arc Data Controller, we have direct connectivity to our Azure subscription. With this in mind, there are several options as we previously discussed on how to deploy the data controller. For this post, we are using the portal deployment method.

Finally, Part 4 covers management options:

With ADS open and running you can create connections to Arc Data Controllers the same as you can with Instances of SQL Server. In ADS we have under the connections area a section specific for Arc Data Controllers.

Check out all four posts.

Comments closed

Optimizing Azure Pricing for Storage and VMs

Shane Baldacchino continues a series on cost optimization in the cloud:

Cost. I have been fortunate to work for and help migrate one of Australia’s leading websites ( in to the cloud and have worked for both large public cloud vendors. I have seen the really good, and the not so good when it comes to architecture.

Cloud and cost. It can be quite a polarising topic. Do it right, and you can run super lean, drive down the cost to serve and ride the cloud innovation train. But inversely do it wrong, treat public cloud like a datacentre then your costs could be significantly larger than on-premises.

Click through for some good advice, including an appreciation of spot instances.

Comments closed

Power BI Field Parameters and None Options

Barney Lawrence votes None of the Above:

Field Parameters
 are one of my favourite recent additions to Power BI. The ability to turn a single chart into potentially dozens changes the way we think about putting variations of visuals on the page. It was a real wow moment for a client recently when I showed how field parameters for 5 fields and 5 measures could produce a single report page that replaced 25 of their existing reports.

While they theoretically don’t allow you to do much that you couldn’t previously with a disconnected slicer and a lot of DAX they build it faster and without the need to get heavily in to coding DAX. Anything that lowers the difficulty bar for users trying to make the most out of Power BI is a good thing in my book.

There are a couple of issues Barney has with them as they stand now but there are workarounds.

Comments closed