Press "Enter" to skip to content

Curated SQL Posts

Installing Spark on Windows 10

Gopal Tiwari shows how you can install Apache Spark on Windows 10:

By default, Spark SQL projects do not run on Windows OS and require us to perform some basic setup first; that’s all we are going to discuss in this article, as I didn’t find it well documented anywhere over the internet or in books.

This article can also be used for setting up a Spark development environment on Mac or Linux as well. Just make sure you’ll downloading the correct OS-version from Spark’s website. 

You can refer to the Scala project used in this article from GitHub here: https://github.com/gopal-tiwari/LocalSparkSql.

I’ve seen (and written) installation guides for Spark. This is a good one, as it goes beyond installation and into kicking off a project and ensuring that it works.

Comments closed

Finding the Query Used in DirectQuery Mode

Kasper de Jonge shows us how we can find which query ran in DirectQuery mode to populate a Power BI data set:

When you are optimizing your DirectQuery model and you have done all the optimizations on the model already, you might want to run the queries generated by Power BI by your DBA. He then might be able to do some index tuning or even suggest some model changes. But how do you capture them? There are a few simple ways that I will describe here.

Read on for 3 1/2 such methods.

Comments closed

When Transactional Replication Makes Sense

Jonathan Kehayias has some good use cases for transactional replication:

Why in the world would you want to use Transactional Replication?  Isn’t that the thing that is impossible to manage, difficult to configure, and always prone to problems? As a consultant, I see a lot of scenarios where every problem becomes a nail because all you have is a hammer. Sometimes another technology can provide a solution to problems that exist but little is known about the technology, and Transactional Replication tends to fall into this scenario in my experience. In this post we are going to take a look some of the more common Transactional Replication use cases as well as some scenarios where it can be used to solve a business problem in a different way. As with anything, Transactional Replication is just another tool to have in your toolbox.

To be fair, merge replication is much more impossible to manage, difficult to configure, and prone to problems…

Comments closed

Azure Data Studio February Release

Alan Yu announces the February 2020 release of Azure Data Studio:

Azure Data Studio is a multi-database, cross-platform desktop environment for data professionals using the family of on-premises and cloud data platforms on Windows, MacOS, and Linux. To learn more, visit our Github.

The key highlights to cover this month include:

– Improved Azure sign in support.
– Find in Notebook support.
– Visual Studio Code merge to 1.42.
– Bug fixes.

Read on for more details on each topic.

Comments closed

Loading the SQL Server Error Log into a Table

Jeff Mlakar shows how you can load the SQL Server error log into a table:

Why Not Just Use the File System?

When possible I prefer to go into the file system and open the SQL Server error logs in a text editor e.g. Notepad++ or UltraEdit. However, there are sometimes you may have access to a SQL Server instance but not be able to RDP or otherwise scan the file system.

That’s where this comes in – straight T-SQL.

Click through for demos.

Comments closed

Monitoring for Distribution Changes

Nina Zumel explains how we can track if something has changed by monitoring its distribution:

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, like many others who have faced the same problem, simply checked whether the mean and standard deviation of the data had changed more than some amount, where the threshold value they checked against was selected in a more or less ad-hoc manner. But they were curious whether there was some other, perhaps more principled way, to check for a change in distribution.

The answer is, of course, that there is. Click through to see a few of the techniques.

Comments closed

Publishable Adverse Event Tables in R

Inge Christoffer Olsen shows how to clean up tables in R for publication:

The summary of Adverse Events is a nice table just summing up the adverse events in the trial. Note the “[N] n (%)”-format which is the number of events, number of patients with events and percentage of patients with event.

This particular example is about adverse events, but the key concepts in the code apply to many kinds of tables you want to make look a bit nicer. H/T R-Bloggers

Comments closed

Handling Azure SQL Database Scale Changes

Arun Sirpal shows us how to handle scaling events in Azure SQL Database:

For some reason I have friends / colleagues telling me that when scaling (up and down for this example) that no downtime occurs. Well, not only does Microsoft documentation say differently, I will show it. So let’s test it out. Before the practical test, this is the official stance. “There is a switch over period where connectivity is lost to the database for a short amount of time, which can be mitigated using retry logic”.

Retry logic is an important part of any application. We tend to forget about it with on-prem applications talking to on-prem databases, but that’s a mistake.

Comments closed

SELECT * Doesn’t Keep Views up to Date

Reitse Eskens busts a myth:

Last week i read a comment or a blog somewhere (i can’t remember it anymore but please let me know so i can credit!), that a select * in a view gets a full definition under the covers and doesn’t get updated when the underlying table is updated.

So, i decided to take that for a test, see what i can reproduce.

Spoilers: it doesn’t. Click through for the proof, as well as what does update a view’s definition.

Comments closed

Guids in Persisted Calculated Columns

Slava Murygin shows us some odd things which occur when you try to make a persisted calculated column out from a UNIQUEIDENFITIER data type:

This post is for you in case you decide to use Uniqueidentifier column in your table and then you think about including it into a Persisted calculated column.
You also might see it useful if you like weird or funny SQL Server behavior.

Slava has an Azure feedback item and it looks like someone tested the behavior in SQL Server 2019 and it works as you’d expect, so this must have been fixed sometime between then and now.

Comments closed