Press "Enter" to skip to content

Day: May 16, 2024

Selecting Columns Containing a Specific String in R

Steven Sanderson goes hunting for strings:

Today I want to discuss a common task in data manipulation: selecting columns containing a specific string. Whether you’re working with base R or popular packages like stringrstringi, or dplyr, I’ll show you how to efficiently achieve this. We’ll cover various methods and provide clear examples to help you understand each approach. Let’s get started!

Click through for five examples across the three methods.

Comments closed

Making Code Developer Friendly with an Example in R

Mark Niemann-Ross says the rest is commentary:

If you are reading this, you’re a coder and use functions. We write them for ourselves. If someone else writes a function, you can hope it works. If it doesn’t, you can hope to fix it. Hopefully, the return value is obviously correct. But maybe it’s subtly wrong?

If things are amiss, read the name of the function and hope it’s descriptive. I worked with a programmer who omitted all vowels from his function names. So the above code would expand to this…

Read on for the rationale behind commenting your functions appropriately, as well as one way to do it in R. There is a bit of art and a bit of science to writing good comments, but the starting point is simply having them to begin with. And the more clever you feel like you’re being, the more you need to comment this, because three months from now, you probably won’t be feeling quite as clever. H/T R-Bloggers.

Comments closed

The Joy of Partitioned Views

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

Job Threading and Thread Partitioning in SQL Server

Aaron Bertrand continues a series on threading:

In part 2 of this series, I showed an example implementation of distributing a long-running workload in parallel, in order to finish faster. In reality, though, this involves more than just restoring databases. And I have significant skew to deal with: one database that is many times larger than all the rest and has a higher growth rate. So, even though I had spread out my 9-hour job with 400 databases to run faster by having four threads with 100 databases each, one of the threads still took 5 hours, while the others all finished within 1.5 hours.

Read on to learn what Aaron did to make things move faster.

Comments closed

MFA Requirement for Azure Users

Erin Chapple opens a can of worms:

This July, Azure teams will begin rolling out additional tenant-level security measures to require multi-factor authentication (MFA). Establishing this security baseline at the tenant level puts in place additional security to protect your cloud investments and company. 

MFA is a security method commonly required among cloud service providers and requires users to provide two or more pieces of evidence to verify their identity before accessing a service or a resource. It adds an extra layer of protection to the standard username and password authentication.

The problem is, there are a lot of good questions people are asking in the comments and currently, there are no answers.

Comments closed

Actual Execution Plans and Lock Waits

Erik Darling notices me in a leg cast staring through his window with my telescope:

A long time ago, I complained that wait stats logged by actual execution plans don’t show lock waits. That seemed like a pretty big deal, because if you’re running a query and wondering why sometimes it’s fast and sometimes it’s slow, that could be a pretty huge hint.

Click through for the full story. Getting actual waits is indeed a big deal, and way easier than any of the alternatives like spinning up a special extended events session or yelling at everyone not to use the server for a few minutes while you ran your query.

Comments closed

Using F-SKU Power BI Capacity and Microsoft Fabric

Chris Webb has a public service announcement:

Since the announcement in March that Power BI Premium P-SKUs are being retired and that customers will need to migrate to F-SKU capacities intead I have been asked the same question several times:

Why are you forcing me to migrate to Fabric???

This thread on Reddit is a great example. What I want to make clear in this post is the following:

Moving from P-SKU capacities to F-SKU capacities is not the same thing as enabling Fabric in your tenant

Click through for Chris’s explanation. Also check out the comments section for this one, as there are plenty of questions and responses in there.

Comments closed

Using the pg_repack Extension

Muhammad Ali tries out an extension:

Regular updates and deletions within PostgreSQL tables can lead to various issues such as bloat, fragmentation, and a decline in performance over time. These challenges can significantly impact the efficiency and reliability of the database, potentially affecting critical operations.

To address these concerns, PostgreSQL introduced the pg_repack extension, which provides a robust solution for managing table maintenance without disrupting the production environment. By allowing tables to be rebuilt online, pg_repack tackles bloat and fragmentation issues, ensuring that database storage remains optimized and performance is consistently maintained.

Read on to see why vacuuming might not be enough and what pg_repack does.

Comments closed