Kevin Feasel – Page 1035

I have the impression that CSelCalcColumnInInterval “fails” if the predicate doesn’t fall within any of the histogram intervals. The estimation logic then chooses to try the CSelCalcAscendingKeyFiltercalculator (a reference to the “ascending key problem”) if the predicate is specifically higher than the last histogram interval.

Josh includes a couple of demos as well, so check them out.

Comments closed

Checking Spark Config on Windows

Published 2019-08-29 by Kevin Feasel

Ed Elliott has a Powershell script to tell you if your Spark configuration on Windows is incorrect:

There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.
To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.

Click through for the script.

Comments closed

Eliminating Tail Calls in Python

Published 2019-08-29 by Kevin Feasel

John Mount shows how you can eliminate tail calls in Python:

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques.
The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead immediately returned). Tail call elimination can both speed up programs, and cut down on the overhead of maintaining intermediate stack frames and environments that will never be used again.

Click through for John’s riff on the topic.

Comments closed

Local Database Builds with Jenkins

Published 2019-08-29 by Kevin Feasel

Steve Jones continues a series on continuous integration, containers, and all that is good in life:

The only way to build a database project in SQL Server is with an actual SQL Server. In this case, I don’t have any code that would error on LocalDB, so I’ll just use that. I coudl specify my local SQL Server development database if I had the need.
This is a test build, so I also don’t need any SQL Compare options or other switches.

Getting code into source control and building continuous integration around it has become a lot easier over the past several years. Easy enough that you can work a simple system out in a day or two of experimentation.

Comments closed

Finding Unused Indexes in SQL Server

Published 2019-08-29 by Kevin Feasel

Monica Rathbun shows us how we can find and remove unused indexes in SQL Server:

Indexes can be incredibly beneficial to your database performance; however, they do come with a cost—indexes both consume storage space and affect insert performance. Therefore, it is important as part of your index maintenance procedures that you periodically check to see if your indexes are being used. Many times, indexes are created in the belief they are needed but in fact they are never used. You can reduce that IO overhead on inserts when you remove unnecessary indexes.

I’ll use the same script. Typically, I won’t drop unless total reads is 0 or at least two or three orders of magnitude smaller than writes. Sometimes you have indexes which don’t get used frequently but support very expensive or time-sensitive reports, and you don’t want those getting caught up in your dragnet.

Comments closed

Why ALTER Implies DROP

Published 2019-08-29 by Kevin Feasel

Andy Mallon explains why granting ALTER operations to a user means that you’ve granted DROP permissions:

Dropping stuff is destructive. Anyone who’s ever dropped a plate or a car knows that. But dropping a table, procedure, or database makes it go away completely. The only way to un-drop something is to recover it from backup, or to re-deploy from source control (you do have all your code & schema in source control, right?). Getting back to the original question’s premise (it’s OK to change the object but not to completely make it disappear), I ask…does it even matter? What are you preventing? Is dropping an object worse than altering it?

Read on for Andy’s reasoning.

Comments closed

Dropping a Column as a Metadata Operation

Published 2019-08-29 by Kevin Feasel

Max Vernon takes us through column dropping:

Dropping a column that is not referenced by any other object lets the storage engine simply mark the column definition as no longer present. Deleting the meta-data invalidates the procedure cache. Any query that subsequently references the affected table will result in the plan for that query be recompiled. The recompile operation can only return columns that currently exist in the table. As a result, the storage engine skips the bytes stored in each page for the dropped column, as if the column no longer exists.

This has some nice benefits in practice around minimizing deployment-releated downtime.

Comments closed

Optimizing for Sequential Keys

Published 2019-08-29 by Kevin Feasel

Dennes Torres gives us a reminder of what the world was like before a new feature in SQL Server 2019:

Once upon a time a SQL Server version that hadn’t row locks. The minimal level of lock was page lock, every time you want to lock a record, an entire page was locked.

At that time we were between the devil and the deep sea: if we choose a clustered index with an ascending key we would create what was called a Hot Spot, all the records would be inserted on the same page, creating a bottleneck. On the other hand, if we create a clustered index with a non-ascending key, we would suffer from index fragmentation and page splits, having huge admin trouble to find out the correct fill factor for each index in order to support the period between the re-index job without too many page splits.

Dennes covers the specific case which this feature intends to cover and how we got there.

Comments closed

Filtering Stored Procedure Results

Published 2019-08-29 by Kevin Feasel

Thomas Rushton shows how to filter any stored procedure’s result set and uses sp_who as an example:

sp_who – useful – up to a point. Particularly when the server is busy, and you’re looking for something specific (eg to see if certain processes are out of a database before running an update)
If the server is busy – don’t you wish there was a way to run something like
sp_who WHERE dbname = 'foo'
Yeah. Unfortunately, it doesn’t work like that.

That doesn’t work, but Thomas shows you what does.

Comments closed

Sampling and Estimating Rare Events

Published 2019-08-28 by Kevin Feasel

Yi Liu takes us through a process to estimate rare events:

Naturally, we get an unbiased estimate of the overall prevalence of violation if we sample the videos uniformly from the population and have them reviewed by human raters to estimate the proportion of violating videos. We also get an unbiased estimate of the violation rate in each policy vertical. But given the low probability of violation and wanting to use our rater capacity wisely, this is not an adequate solution — we typically have too few positive labels in uniform samples to achieve an accurate estimate of the prevalence, especially for those sensitive policy verticals. To obtain a relative error of no more than 20%, we need roughly 100 positive labels, and more often than not, we have zero violation videos in the uniform samples for rarer policies.

This is similar in nature to testing for rare diseases, where a random sample of N people in the population is likely to turn up 0 cases of it.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Author: Kevin Feasel

Estimates outside the Histogram Range

Checking Spark Config on Windows

Eliminating Tail Calls in Python

Local Database Builds with Jenkins

Finding Unused Indexes in SQL Server

Why ALTER Implies DROP

Dropping a Column as a Metadata Operation

Optimizing for Sequential Keys

Filtering Stored Procedure Results

Sampling and Estimating Rare Events