Press "Enter" to skip to content

Day: August 29, 2019

Checking Spark Config on Windows

Ed Elliott has a Powershell script to tell you if your Spark configuration on Windows is incorrect:

There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.

To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.

Click through for the script.

Comments closed

Eliminating Tail Calls in Python

John Mount shows how you can eliminate tail calls in Python:

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques.

The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead immediately returned). Tail call elimination can both speed up programs, and cut down on the overhead of maintaining intermediate stack frames and environments that will never be used again.

Click through for John’s riff on the topic.

Comments closed

Local Database Builds with Jenkins

Steve Jones continues a series on continuous integration, containers, and all that is good in life:

The only way to build a database project in SQL Server is with an actual SQL Server. In this case, I don’t have any code that would error on LocalDB, so I’ll just use that. I coudl specify my local SQL Server development database if I had the need.

This is a test build, so I also don’t need any SQL Compare options or other switches.

Getting code into source control and building continuous integration around it has become a lot easier over the past several years. Easy enough that you can work a simple system out in a day or two of experimentation.

Comments closed

Finding Unused Indexes in SQL Server

Monica Rathbun shows us how we can find and remove unused indexes in SQL Server:

Indexes can be incredibly beneficial to your database performance; however, they do come with a cost—indexes both consume storage space and affect insert performance. Therefore, it is important as part of your index maintenance procedures that you periodically check to see if your indexes are being used. Many times, indexes are created in the belief they are needed but in fact they are never used. You can reduce that IO overhead on inserts when you remove unnecessary indexes.

I’ll use the same script. Typically, I won’t drop unless total reads is 0 or at least two or three orders of magnitude smaller than writes. Sometimes you have indexes which don’t get used frequently but support very expensive or time-sensitive reports, and you don’t want those getting caught up in your dragnet.

Comments closed

Why ALTER Implies DROP

Andy Mallon explains why granting ALTER operations to a user means that you’ve granted DROP permissions:

Dropping stuff is destructive. Anyone who’s ever dropped a plate or a car knows that. But dropping a table, procedure, or database makes it go away completely. The only way to un-drop something is to recover it from backup, or to re-deploy from source control (you do have all your code & schema in source control, right?). Getting back to the original question’s premise (it’s OK to change the object but not to completely make it disappear), I ask…does it even matter? What are you preventing? Is dropping an object worse than altering it?

Read on for Andy’s reasoning.

Comments closed

Dropping a Column as a Metadata Operation

Max Vernon takes us through column dropping:

Dropping a column that is not referenced by any other object lets the storage engine simply mark the column definition as no longer present. Deleting the meta-data invalidates the procedure cache. Any query that subsequently references the affected table will result in the plan for that query be recompiled. The recompile operation can only return columns that currently exist in the table. As a result, the storage engine skips the bytes stored in each page for the dropped column, as if the column no longer exists.

This has some nice benefits in practice around minimizing deployment-releated downtime.

Comments closed

Optimizing for Sequential Keys

Dennes Torres gives us a reminder of what the world was like before a new feature in SQL Server 2019:

Once upon a time a SQL Server version that hadn’t row locks. The minimal level of lock was page lock, every time you want to lock a record, an entire page was locked.

At that time we were between the devil and the deep sea: if we choose a clustered index with an ascending key we would create what was called a Hot Spot, all the records would be inserted on the same page, creating a bottleneck. On the other hand, if we create a clustered index with a non-ascending key, we would suffer from index fragmentation and page splits, having huge admin trouble to find out the correct fill factor for each index in order to support the period between the re-index job without too many page splits.

Dennes covers the specific case which this feature intends to cover and how we got there.

Comments closed

Filtering Stored Procedure Results

Thomas Rushton shows how to filter any stored procedure’s result set and uses sp_who as an example:

sp_who – useful – up to a point. Particularly when the server is busy, and you’re looking for something specific (eg to see if certain processes are out of a database before running an update)

If the server is busy – don’t you wish there was a way to run something like

sp_who WHERE dbname = 'foo'

Yeah. Unfortunately, it doesn’t work like that.

That doesn’t work, but Thomas shows you what does.

Comments closed