Press "Enter" to skip to content

Author: Kevin Feasel

Finding The Closest Numeric Match

Itzik Ben-Gan has a T-SQL puzzle for us:

As you can see, both T1 and T2 have a numeric column (INT type in this example) called val. The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.val and T1.val is the lowest. In case of ties (multiple matching rows in T2), match the top row based on val ascending, keycol ascending order. That is, the row with the lowest value in the val column, and if you still have ties, the row with the lowest keycol value. The tiebreaker is used to guarantee determinism.

Click through for the details as well as several solutions.

Comments closed

Lessons From Being Self-Employed

Eugene Meidinger shares some hard-learned lessons from being self-employed:

I’m naturally an introvert. If you and I have a conversation, it’s like a little taxi meter starts running. I may deeply, deeply enjoy the conversation and find it incredibly exciting, but it still taxes my energy levels. Small talk even more so. Imagine that every time someone chatted about the weather, you had to pay the same price as a Lyft ride to go 4 blocks. That’s how I feel about small talk.
That being said, we are still social creatures, and even introverts need human interaction. Especially so when you need to think through new situations, new problems. One of the things I realized attending PASS Summit is that I need social interaction to thrive. So now I spend a lot more time on Twitter and am part of a peer group of authors. I work down at the library whenever I have the chance.

When I did the work-from-home full-time thing, I sought out user groups to build up some technical skills and, more importantly, to get out of the house and talk to a group of people a couple times a week.  That paid off really well in the long run.

Speaking of paying off in the long run, check out Eugene’s BI newsletter.

Comments closed

Naming Is Hard

Tim Mitchell calls out some bad naming practices:

Spaces in object names. The bane of my existence. Most relational databases will allow you to use spaces in object names, requiring anyone accessing that object to put brackets around the name. My rule of thumb is that if it’s something that I will interact with programmatically, it doesn’t get a space in the name. Spaces in object names tend to break things, so please stop doing this.

My favorite naming convention oddity is the idea that if something has “Number” in the name, it’s never a number.

Comments closed

Take The Data Professional Salary Survey

Brent Ozar has a new edition of the Data Professional Salary Survey:

It’s time for our annual salary survey to find out what data professionals make. You fill out the data, we open source the whole thing, and you can analyze the data to spot trends and do a better job of negotiating your own salary:

Take the Data Professional Salary Survey now.

The anonymous survey closes Sunday, January 6, 2019. The results will be completely open source, and shared with the community for your analysis.

I like this survey so much that I delivered a talk at PASS Summit making heavy use of it.

Comments closed

Donating To The R Foundation

Mark Niemann-Ross explains how you can donate to the R Foundation:

I benefit from the work of the R Foundation. They oversee the language, but also encourage a healthy ecosystem. CRAN happens because of them. Updates to R happen because of them. useR! happens because of them. Every day, you and I are the recipients of some part of their time.
The least we can do is show them some appreciation. If you point your web browser at https://www.r-project.org/foundation/donations.html you’ll find a convenient (and surprisingly inexpensive) place to express your appreciation. As an individual, you can send these kind folks twenty-five euros to tell them you’re in favor of what they do.

But be sure to read the whole thing, especially if you are an American who wants the donation to be tax-deductible.  I believe that earmarking in this case is adding special instructions on SIAA’s PayPal page.

Comments closed

Timing Means Of Groups With R

John Mount shares some performance measures pitting data.table against various dplyr methods for calculating grouped means:

In this reproduction attempt we see:
– The dplyr time being around 0.05 seconds. This is about 5 times slower than claimed.
– The dplyr sum()/n() time is about 0.2 seconds, about 5 times faster than claimed.
– The data.table time being around 0.004 seconds. This is about three times as fast as the dplyr claims, and over ten times as fast as the actual observed dplyr behavior.

Read the whole thing.  If you want to replicate it yourself, check out the RMarkdown file.

Comments closed

ggmap Tutorial

Laura Ellis has an updated ggmap tutorial:

For those of you who have been following along with issue #51 in the ggmap repo, you’ll notice that there have been a number of changes in the Google Maps Static API service. Unfortunately these have caused some breakage in previous ggmap functionality.
If you used this package prior to July 2018, you may were likely able to do so without signing up for the Google Static Map API service yourself. As indicated on the the ggmap github repo – “Google has recently changed its API requirements, and ggmap users are now required to provide an API key and enable billing.  The billing enablement especially is a bit of a downer, but you can use the free tier without incurring charges. Also, the service being exposed through an easy to use r package that extends ggplot2 is pretty great so I’ll allow it.

This recent API change hurts.  But click through for the tutorial, which doesn’t hurt.

Comments closed

Attaching Databases Via Dockerfile

Andrew Pruski shares a better technique for attaching database files held outside of a Docker container:

Now this works a treat. It waits ten seconds for the SQL instance to come up within the container and then runs the sqlcmd script below.
The problem with this is, it’s a bit of a hack. The HEALTHCHECK command isn’t designed to run once, it’ll carry on running every 10 seconds once the container comes up…not great.
So, what’s the better way of doing it?

Andrew gives us a clear explanation of what’s going on and gives a shout out to Bob Ward’s SQL Server on Linux book.

Comments closed

Disabling SQL Agent Jobs For Maintenance Periods

Jon Shaulis shows us a way to disable SQL Agent jobs with T-SQL:

A user had a unique issue where their system would have dynamically changing job names and schedules, but they need to disable and re-enable them during maintenance. Obviously, this is a huge headache.
I made a recommendation that they should ultimately create a list of currently enabled jobs that had a schedule using a system query.

SELECT *
FROM MSDB.dbo.sysschedules ss
     INNER JOIN msdb.dbo.sysjobschedules jss
ON jss.schedule_id = ss.schedule_id
WHERE ss.enabled = 1;
The code above returns all schedules that are paired to a job that are enabled. The enabled = 1 flag and the inner join to the sysjobschedules table are what dictate those filters.

Read on for more details about what’s going on and some caveats.

Comments closed

Nested Loop Joins

Bert Wagner walks us through nested loop joins:

Nested loops joins work like this: SQL Server takes the first value from our first table (our “outer” table – by default SQL Server decides for us which table of the two this will be), and compares it to every value in our second “inner” table to see if they match. 
Once every inner value has been checked, SQL Server moves to the next value in the outer table and the process repeats until every value from our outer table has been compared to every value in our inner table.

This description is a worst case example of the performance of a nested loop join.

Read the whole thing.  Understanding physical join operators is a key to figuring out if your data retrieval is as fast as it should be. 

Comments closed