Press "Enter" to skip to content

Curated SQL Posts

Q&A on Data Engineering

Dustin Vannoy talks to the mirror:

An aspiring data engineer recently reached out to me for some guidance on pivoting into the field from a software development background. The questions they asked are similar to what others have asked me in the past, so I decided to capture my responses here. I link to prior posts and other resources when possible to try and keep the responses brief. These are informal thoughts of mine, not something I have sat down to rethink and research for new ideas beyond what is already in my head.

Dustin is one of the best people to talk to about data engineering. Click through for his advice.

Comments closed

Well-Architected Framework for Oracle in Azure

Kellyn Pot’vin-Gorman has a new tool for us:

This invaluable framework provides clear guidance on the recommended practices to assess, architect and migrate Oracle workloads to the Azure cloud.  This should be the first place for answers to success for Oracle on Azure!

A special thanks to my teammate, Jessica Haessler for working so hard to help me get this to the finish line, as I would have never been able to get this done on my own!  

Click through for a link to the guide. There isn’t a Well-Architected Framework assessment for this yet but the WAF articles themselves have quite a bit of detail to them.

Comments closed

Finding a Scalar Function Caller

Matthew McGiffen searches for the root of the problem:

In this post we look at a method using Extended Events (XE) to identify what parent objects are calling a given SQL function and how often.

The background is that I was working with a team where we identified that a certain scalar function was being executed billions of time a day and – although lightweight for a single execution – overall it was consuming significant CPU on the server. We discussed a way of improving things but it required changing the code that called it. The problem was that the function was used in about 700 different places across the database code – both in stored procedures and views – though the views themselves would then be referenced by other stored procedures. Rather than update all the code they’d like to target the objects first that execute the function the most times.

Read on to see how Matthew did it, as well as some caveats along the way.

Comments closed

Roles and Privileges in Postgres

Ryan Booz gives us an introduction to Postgres security:

Recall that in PostgreSQL both users and groups are technically roles. These are always created at the cluster level and granted privileges to databases and other objects therein. Depending on your database background it may surprise you that roles aren’t created as a principal inside of each database. For now, just remember that roles (users and groups) are created as a cluster principal that (may) own objects in a database, and owning an object provides additional privileges, something we’ll explore later in the article.

For the purposes of this article, all example user roles will be created with password authentication. Other authentication methods are available, including GSSPI, SSPI, Kerberos, Certificate, and others. However, setting up these alternative methods is beyond what we need to discuss object ownership and privileges.

Read the whole thing if you’re doing anything with Postgres.

Comments closed

Reading the Data Lake with the Serverless Pool via OPENROWSET

Ryan Adams begins a series on reading data from the data lake:

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the first method which uses OPENROWSET to query a path within the lake. 

Synapse is a collection of tools with four different analytical engines (Dedicated PoolSpark PoolServerless PoolData Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  The article will focus on how you can use the Synapse Serverless Pool to query the data in your ADLS account.   

Click through for a primer on the topic, as well as a demo video.

Comments closed

Restoring Large Power BI Premium Backups

Gilbert Quevauvilliers has a hefty backup:

When using Power BI Premium or Premium Per user you get the option to backup the database, there can be occasions when you try and restore the backup and it fails.

The reason that it could fail is because when a restore happens it can consume additional memory which would take you up and over the memory limit.

Below I will explain a new option which allow this to restore successfully!

Gilbert includes a copy of the error message and one new option in the post.

Comments closed

Designing Test Code for shinytest2

Russ Hyde wraps up a series on shinytest2:

UI-driven end-to-end tests require a bit more code than unit tests. For example, starting the app and navigating around to set up some initial state will require a few lines of code. But these are things you’ll likely need to do in several tests. As you add more and more test cases and these commonalities reveal themselves, it pays to extract out some helper functions and / or classes. By doing so, your tests will look simpler, the behaviour that you are testing will be more explicit, and you’ll have less code to maintain. We’ll show some software designs that may simplify your {shinytest2} code.

This post builds upon the previous posts in the series, but is quite a bit more technical than either of them. In addition to shiny development, you’ll need to know how to define functions in R and for the last section you’ll need to know about object-oriented programming in R (specifically using R6). The ideas in that section may be of interest even if you aren’t fluent with R6 classes yet.

Click through for the series finale.

Comments closed

Emulating Window Functions in MySQL 5.7

Lukas Eder says, we have window functions at home:

One of MySQL 8’s biggest improvements is the support of window functions. As I always said in conferences, there’s SQL before window functions and SQL after window functions. Once you start using them, you’ll use them everywhere.

Some of you poor souls are unfortunate enough to be stuck on MySQL 5.7, either of your own choosing, or because you’re using a clone / fork that is still 5.7 compatible. While for most people, this blog post is just for your amusement, or nostalgia, for some of you this post will be quite useful.

If you are in a windowless world, read on to see how you can make life a little more manageable.

Comments closed

Performance Comparing DISTINCT to GROUP BY

Reitse Eskens does a performance comparison:

A few days ago, I heard someone stating that Group By was much quicker than Distinct. Less disk impact, less memory etc.
So, I thought I’d find out if it’s true or not because I found it interesting. I always thought there was no difference. I tested a single small table and found no difference in speed, reads or execution plan. But that’s no real world example. Usually the tables contain a lot of data and are joined to other tables.

Click through for the results of Reitse’s analysis.

Comments closed