Press "Enter" to skip to content

Curated SQL Posts

The Myth of the DBA-Free Cloud

Matthew McGiffen lays out an explanation:

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

I still remember (through others—I wasn’t in this space yet) the advertising campaign that SQL Server 2005 would completely eliminate the need for a DBA because everything would just work on its own, even sweet database tuning using the Database Tuning Advisor. The same thing applies today: even those DBA-free databases eventually need somebody to optimize them along various dimensions, ensure they are running smoothly, and correct issues if they are not. Perhaps we could call this role the Administrator of a Database or AoD, so as not to scare the DBA-free database vendors. “No, we don’t have DBAs—we just need you to have a few AoDs on staff.”

Comments closed

Query Snowflake Data from Spark

The Big Data in Real World team crosses data platforms:

If your organization is working with lots of data you might be leveraging Spark to compute distribution. You could also potentially have some or all your data in a Snowflake data warehouse.

In a situation like this, you might have to expose data in Snowflake to the processes that run on Spark. This is made possible using the Spark Connector for Snowflake.

In this post, we will see what is Spark connector for Snowflake and how to use it from Spark to connect to Snowflake and access data from Snowflake in your Spark cluster.

Read on for a high-level architecture of how it works and the configuration you’ll need to do to get it running.

Comments closed

Common Date and Time Operations in R

Steven Sanderson works with dates:

Dates and times are essential components in many programming tasks, and R provides various functions and packages to handle them effectively. In this post, we’ll explore some common operations using both the base R functions and the lubridate package, comparing their simplicity and ease of understanding.

I personally prefer the lubridate style of date operation, but it’s nice to have options.

Comments closed

Importing Code into Polyglot Notebooks

Matt Eland brings some code to the party:

We’ve seen that Polyglot Notebooks allow you to mix together markdown and code (including C# code) in an interactive notebook and these notebooks allow you to share data between cells and between languages. However, frequently in programming you want to reference code that others have written without having to redefine everything yourself.

In this article we’ll explore how Polyglot Notebooks allows you to import dotnet code from stand-alone files, DLLs, and NuGet packages so your notebooks can take advantage of external code files and the same libraries that you can work with from your code in Visual Studio.

The syntax, by the way, is very similar to the F# Interactive (and the short-lived C# Interactive) tool, particularly #i and #r.

Comments closed

Recursive Common Table Expressions in Snowflake

Kevin Wilkie is too fancy for simple joins:

Today, I want to talk about that fun edge case when you’re having to join a table to itself in Snowflake. Does it happen often? Not unless your architect just hates you.

Let’s use the normal pieces of data that everyone uses for this kind of thing – employee/manager relationships. We have our employee table that we’ve been working off that we’ll play with for this example.

The syntax is a bit different from T-SQL, but the concept is still the same.

Comments closed

Encrypting SQL Server Backups

Matthew McGiffen lays out the requirements:

When we talk about protecting our at-rest data, the item that we are likely to be most concerned about is the security of our backups. Backups are generally – and should be – stored off the server itself, and often we will ship copies offsite to a third party where we don’t have control over who can access the data, even if we trust that that will be well managed.

From SQL Server 2014 the product has included the ability to encrypt data while creating a backup. This feature is available in both the standard and enterprise editions of SQL Server, so it is something you can use even when TDE may not be a feature that is available to you.

Click through for a primer on the topic.

Comments closed

Against Keys in Fact Tables

Marc Lelijveld searches for keys under the lamppost:

Another blog post based on recent client experiences. Last week, I visited a client where we had extensive discussions on data model optimization. As you might know, data modeling in Power BI is one of my favorite topics, so I had an excellent day. It’s also not the first time that I blog about anything data modeling and optimization. If you haven’t read it yet, I recommend reading my previous blog on this topic.

This blog will focus on the need of keys in your tables and primarily your fact tables in your data model. I keep running into data models at customers which are flooded with keys in all tables. For each of them you should ask, do I really need this and could I save it in a different data type for further optimization. In this blog, I will further elaborate on keys in your data model, typical use cases and how these cases can be solved in different manners.

Read the whole thing. The really short version is classic Kimball-style advice: keys for dimensions, not for facts. And in Power BI, removing a unique column from a fact table can speed things up by shrinking the compressed fact table size.

Comments closed

Oracle Connectors for SSIS

Meagan Longoria talks to Oracle:

We needed to make sure we had the most current version of the drivers required to make the package work.

This turned out to be an adventure involving 3 different drivers and plans to refactor, so I’m documenting some of it here in case it helps someone else.

Read on for your options.

Comments closed