Press "Enter" to skip to content

Curated SQL Posts

Dropping Data Frame Columns in R

Steven Sanderson performs feature selection:

As an R programmer, one of the fundamental tasks you’ll encounter is manipulating data frames. Whether you’re cleaning messy data or preparing it for analysis, knowing how to drop unnecessary columns is a valuable skill. In this guide, we’ll walk through the process of dropping columns from data frames in R, using simple examples to demystify the process.

Read on for three ways of doing this.

Leave a Comment

Reviewing a Pull Request with the GitHub CLI

Mike Robbins shows off one feature of gh cli:

Checking out someone else’s pull request (PR) on GitHub is essential in collaborative software development, enabling thorough code reviews, local testing, and detailed feedback. This process allows developers to ensure the quality and functionality of the code before integration, identify and resolve potential conflicts, and maintain consistency across the project. By reviewing changes locally, team members can collaborate more effectively, offering insights and improvements that enhance the overall quality of the project. Additionally, local reviews support security audits and performance evaluations, ensuring that the code is not only functional but also optimized and secure.

I don’t use gh cli as much as I should (in part because 95+ percent of the time, I’m committing into one-man repos and use GitHub Desktop), but every time I do use it, I remember that it’s a nice CLI.

Leave a Comment

An Overview of Powershell for DBAs

David Seis shares tips from an upcoming talk:

PowerShell is an essential tool for SQL Server database administrators looking to streamline their workflow and automate repetitive tasks. When it comes to troubleshooting your SQL Server instances, PowerShell is an essential tool for the DBA toolbox.

As a versatile scripting language and command-line shell, PowerShell enables seamless integration of commands, arguments, variables, and modules with an interface and logic that SQL users would find pleasantly familiar. This blog post will guide you through the basics of PowerShell, demonstrate how to implement simple automation for routine database tasks, and explore advanced automation strategies to optimize your database management.

The post starts as a basic primer on Powershell but also includes plenty of notes on good ways to make use of the language as a DBA.

Leave a Comment

Writing Conditional JOIN and WHERE Clauses

Erik Darling has Blondie stuck in my head:

The OR operator is a perfectly valid one to use in SQL statements. If you use an IN clause, there’s a reasonable chance that the optimizer will convert it to a series of OR statements.

For example, IN(1, 2, 3) could end up being = 1 OR = 2 OR = 3 without you doing a darn thing. Optimizers are funny like that. Funny little bunnies.

The problem generally isn’t when asking for IN or OR for a single column, with a list of literal values, the problem is usually when you:

  • Use OR across multiple where clause columns
  • Use OR in a join clause of any variety
  • Use OR to handle NULL parameters or variables

This is an excellent, detailed article and well worth the read.

Leave a Comment

Building a Docker Image with Docker Build Cloud

Andrew Pruski shows off Docker Build Cloud:

In a previous blog post we went through how to build a Docker container image from a remote (Github) repository.

Here we’re going to expand on that by actually building the image itself remotely, using Docker Build Cloud.

What we can do with Docker Build Cloud is instead of building the image locally and then having to push to a remote container registry (for example the Docker Hub), we can build remotely and then immediately push that image to the registry so that it is available for immediate use by say, our team members or deployment/testing pipelines.

Read on to see how it works.

Leave a Comment

Selecting the Top N Values by Group in R

Steven Sanderson searches for subsets:

In data analysis, there often arises a need to extract the top N values within each group of a dataset. Whether you’re dealing with sales data, survey responses, or any other type of grouped data, identifying the top performers or outliers within each group can provide valuable insights. In this tutorial, we’ll explore how to accomplish this task using three popular R packages: dplyr, data.table, and base R. By the end of this guide, you’ll have a solid understanding of various approaches to selecting top N values by group in R.

Read on for the three examples.

Leave a Comment

Updates in R 4.4.0

Russ Hyde shares items of interest:

R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a beauty. In time-honoured tradition, here we summarise some of the changes that caught our eyes. R 4.4.0 introduces some cool features (one of which is experimental) and makes one of our favourite {rlang} operators available in base R. There are a few things you might need to be aware of regarding handling NULL and complex values.

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

Read on for a big note on tail-call recursion, an operator for coalescing, and a few more neat features.

Leave a Comment

IAsyncEnumerable in C# 8

Camilo Reyes shows off an interface:

The IAsyncEnumerable interface was introduced to address the limitations of the IEnumerable interface and the Task class. This way, you can stream asynchronous data and process it efficiently as soon as it becomes available.

In this take, you will learn how to work with IAsyncEnumerable to asynchronously stream a big table and extract the data in a ETL process. You will also learn the difference between IAsyncEnumerable and IEnumerable and how to use IAsyncEnumerable in your everyday work. Then, you will look at comparisons between the two different approaches and why one is better than the other in certain scenarios.

Read on for a demonstration and dive into how IAsyncEnumerable implementations compare to IEnumerable in terms of memory utilization.

Leave a Comment

Copying Azure SQL Managed Instance Databases

Scott Klein performs a migration:

So, back to our customer. They essentially lifted and shifted their on-premises databases to Azure SQL Managed Instance and have been using it successfully for nearly two years. Again, this is awesome.

Last week they came to us and asked about reporting with Managed Instance. They were looking at data marts and data warehouses, but we needed more information. It turns out they have some people that just want the ability to query the databases, and potentially hook up Excel to these databases for data analysis.

The caveat is that the people I was talking to didn’t want to give the other group direct access to the production environment. Toootally get that. Yeah, like 100% get it. So, what are the options?

Read on for the solution Scott came up with.

Leave a Comment

Troubleshooting sp_getapplock Blocking

Brent Ozar has an epiphany:

I’ll give you an example. A client came to me because they were struggling with sporadic performance problems in Azure SQL DB, and nothing seemed to make sense:

  • sp_BlitzFirst @SinceStartup = 1 showed very clearly that their top wait, by a long shot, was blocking. Hundreds of hours of it in a week.
  • sp_BlitzIndex showed the “Aggressive Indexes” warning on a single table, but… only tens of minutes of locking, nowhere near the level the database was seeing overall.
  • sp_BlitzCache @SortOrder = ‘duration’ showed a couple queries with the “Long Running, Low CPU” warning, and they did indeed have blocking involved, but … pretty minor stuff. Plus, their plan cache was nearly useless due to a ton of unparameterized queries pouring through constantly, overwhelming Azure SQL DB’s limited plan cache.
  • sp_Blitz wasn’t reporting any deadlocks, either. (sp_BlitzLock doesn’t work in Azure SQL DB at the moment because Microsoft’s no longer running the default system health XE session up there. They turned that off in order to save money on hosting costs, and passed the savings on to… wait… hmm)
  • As a last-ditch hail-Mary, I ran sp_BlitzWho repeatedly, trying to catch the blocking happening in action. No dice – the odds that I’d catch it live weren’t great anyway.

Click through for the story and how all the pieces ultimately fit together.

Leave a Comment