Press "Enter" to skip to content

Day: February 19, 2020

Fun with Regressions and the Zero Line

I have a post covering some important things to keep in mind when reviewing a regression:

The Line is NOT the Data

One of the worst things we can do as data analysts is to interpret a regression line as the most important thing on a visual. The important thing here is the per-state set of data points, but our eyes are drawn to the line. The line mentally replaces the data, but in doing so, we lose the noise. And boy, is there a lot of noise.

This was my first point, but I think it’s the most important one to keep in mind: just because we draw a line and there’s a best fit doesn’t mean that fit is actually any good. And if the fit isn’t any good, the line is…optimistic with regard to how informative it is.

Comments closed

Using Sqoop to Import Data into HDFS

Jon Morisi has a primer on Sqoop:

In this article, I’ll walk through using Sqoop to import data to Hadoop (HDFS).

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.”

With respect to SQL Server, Sqoop has two good use cases: pulling data from SQL Server into HDFS, and pulling data from HDFS into a staging table in SQL Server.

Comments closed

Fun with Filtering Between Start and End Dates

Brent Ozar shows why the StartDate + EndDate pattern is not great for filtering:

If all you need to do is look up the memberships for a specific UserId, and you know the UserId, then it’s a piece of cake. You put a nonclustered index on UserId, and call it a day.

But what if you frequently need to pull all of the memberships that were active on a specific date? That’s where performance tuning gets hard: when you don’t know the UserId, and even worse, you can’t predict the date/time you’re looking up, or if it’s always Right Now.

This is where I advocate pivoting to a series of event records, so instead of a start date and end date, you have an event type (started, expired, cancelled, etc.) and a date. There are other alternatives as well, but it’s a good thought exercise.

Comments closed

Power Apps and Read-Only SQL Data Sources

David Eldersveld diagnoses a weird issue:

Recently when working with a table in Azure SQL Database, I came across an issue that manifested itself in Power Apps in a few ways.

The message stated that the data source is read-only, when I knew that my SQL credentials had the correct permissions to insert and update table records. I was successfully doing this with other tables, so SQL was not the issue.

Attempting to use functions like Remove and Patch provided the message, as did Form.Edit.

The solution kind of makes sense after the fact but there’s no way I’d have guessed it.

Comments closed

Troubleshooting Chrome + Reporting Services Issues

Wayne Sheffield walks us through troubleshooting a few issues with using Reporting Services in Chrome:

I was recently working with a client with a SQL Server Reporting Services (SSRS) issue. Their company has standardized on using Google Chrome for the browser. However, they were running into issues when using Google Chrome with SSRS reports.

The first issue was that they were receiving a log in prompt to the SSRS server when browsing to it. The second issue was the infamous Kerberos Double-Hop issue. If you’re not familiar with the Kerberos Double-Hop architecture, check out this link: https://docs.microsoft.com/en-us/archive/blogs/askds/understanding-kerberos-double-hop.

I still have bad memories of trying to get Mozilla and (much earlier) Chrome working with Reporting Services. Ugh.

Comments closed

Describing the First Result Set

Phil Factor dives into sys.dm_exec_describe_first_result_set_for_object():

I’ve been working on a project unkindly nicknamed ‘The Gloop’ because the code is a bit amorphous. Basically, it is an approach to documenting SQL Server databases, using the facilities provided such as the metadata views and DMFs. Although it is relatively simple to record the result returned by a table-valued function I’d rather neglected the stored procedures because there was no metadata that could produce the first result set produced by a procedure or trigger.

I’d been silly because there is, of course, an Execution system DMF that does it: sys.dm_exec_describe_first_result_set_for_object(). it takes as its parameter the object_id of a procedure or trigger and describes the first result metadata for the module with that ID. It has the same result set definition as sys.dm_exec_describe_first_result_set.

I’m going to have to try this out, but I have problems with sp_describe_first_result_set and how it blows up if you use a temp table in the stored procedure whose result set you’re grabbing.

Comments closed

Configuring MAXDOP Correctly

John Welch has a script to check if MAXDOP is configured correctly:

There’s a lot of information on the internet about how to set MAXDOP correctly. Microsoft even provides a knowledge base article with their recommendations. However, if you look at it, there’s a fair amount of information to digest. I’m lazy forgetful efficient, so I wanted to put this into a script I could easily reuse and not have to remember all the details.

Please note that these are just guidelines, and you should consider carefully whether they fit your workloads and scenarios. As is the case anytime you are evaluating system settings, you should test carefully before and after making changes.

Read on for the explanation as well as a link to the script itself.

Comments closed

String Aggregation with T-SQL

Andy Levy talks about one of my favorite functions in SQL Server 2017:


This is another in a group of several posts on modernizing T-SQL code with new features and functionality available in SQL Server.

SQL Server 2016 gave us the STRING_SPLIT() function, but what about the reverse – compiling a set of values into one delimited string? We only had to wait 15 months for the release of SQL Server 2017, and the STRING_AGG() function.

I had the STUFF() / FOR XML PATH trick memorized for quite some time, but that was always a solution which felt like it worked on accident. Even if the new solution weren’t faster than the old, I’d still use it.

Comments closed