Press "Enter" to skip to content

Curated SQL Posts

Data Compression and Data Type Changes

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Leave a Comment

Power BI Model Size and Memory Usage

Chris Webb lays out the limitations:

You probably know that semantic models in Power BI can use a fixed amount of memory. This is true of all types of semantic model – Import, Direct Lake and DirectQuery – but it’s not something you usually need to worry about for DirectQuery mode. The amount of memory they can use depends on whether you’re using Shared (aka Pro) or a Premium/Fabric capacity, and if you’re using a capacity how large that capacity is. In Shared/Pro the maximum amount of memory that a semantic model can use is 1GB; if you are using a capacity then the amount of memory available for models in each SKU is documented in the table here in the Max Memory column:

Read on to learn more.

Leave a Comment

Evenly Spacing Month Charts in ggplot2

Jameson Marriott fixes a spacing issue:

I recently noticed that ggplot2 spaces date axes literally even when grouped by month. I’ve been using ggplot2 extensively for years and I don’t remember noticing before, so this is not really a big deal, but now that I know it bugs me a lot. Take a look below.

I don’t think I had noticed this before either, though now that Jameson has pointed it out, it certainly is annoying. H/T R-Bloggers.

Leave a Comment

Monitoring ML Models in production

Thomas Sobolik and Leopold Boudard talk model drift:

Regardless of how much effort teams put into developing, training, and evaluating ML models before they deploy, their functionality inevitably degrades over time due to several factors. Unlike with conventional applications, even subtle trends in the production environment a model operates in can radically alter its behavior. This is especially true of more advanced models that use deep learning and other non-deterministic techniques. It’s not enough to track the health and throughput of your deployed ML service alone. In order to maintain the accuracy and effectiveness of your model, you need to continuously evaluate its performance and identify regressions so that you can retrain, fine-tune, and redeploy at an optimal cadence.

In this post, we’ll discuss key metrics and strategies for monitoring the functional performance of your ML models in production […]

Click through for the article. There’s a Datadog pitch at the end, but the info is useful regardless of which tool you’re using for monitoring.

Leave a Comment

Legacy Power BI Apps Going Away

Nicky van Vroenhoven shares a public service announcement:

In case you missed the official blog post 2 months ago, I suggest you read my blog post 🙂

Or if you want you can refer to the official blog here: Announcing the retirement of legacy Power BI Apps (pre-audiences).

Already on March 6, 2023(!), Power BI apps with multiple audiences went Generally Available.

Read on for more information, with the note that these things will disappear soon—May 1, 2004 is the date of retirement.

Leave a Comment

Switching All SQL Server Databases to Simple Recovery Model

Vlad Drumea doesn’t need no steenkin’ transaction log backups:

This brief post contains a script that can help switch a whole SQL Server instance, model and all user databases, to SIMPLE recovery.

The script is useful in case of dev/test/QA/UAT instances that have been left by accident to use the default FULL recovery model, yet do not have or need transaction log backups.

Read on for the script. It also shrinks the transaction log file after the switch-over.

Leave a Comment

Vacuum and Autovacuum in Postgres

Pavel Borisov explains the importance of vacuuming your tables:

By default, all table data in Postgres are physically stored using the “heap” method. So every database is a set of 1Gb files (”segments”) and each file is logically split into 8Kb pages. Actual table rows are put into any page with enough free space.

When the row data is updated, a new version of a whole row is constructed and written (to any free space). The old one remains because, at the time of the update, the transaction is not completed and can be rolled back in the future. When the transaction is completed we’ll have two or several versions of the same row in the table. Cleaning old ones is by an asynchronous process called vacuum (and autovacuum).

Read on for more information about how this works and what you can (or should) do to help it along.

Leave a Comment

Dropping Data Frame Columns in R

Steven Sanderson performs feature selection:

As an R programmer, one of the fundamental tasks you’ll encounter is manipulating data frames. Whether you’re cleaning messy data or preparing it for analysis, knowing how to drop unnecessary columns is a valuable skill. In this guide, we’ll walk through the process of dropping columns from data frames in R, using simple examples to demystify the process.

Read on for three ways of doing this.

Comments closed

Reviewing a Pull Request with the GitHub CLI

Mike Robbins shows off one feature of gh cli:

Checking out someone else’s pull request (PR) on GitHub is essential in collaborative software development, enabling thorough code reviews, local testing, and detailed feedback. This process allows developers to ensure the quality and functionality of the code before integration, identify and resolve potential conflicts, and maintain consistency across the project. By reviewing changes locally, team members can collaborate more effectively, offering insights and improvements that enhance the overall quality of the project. Additionally, local reviews support security audits and performance evaluations, ensuring that the code is not only functional but also optimized and secure.

I don’t use gh cli as much as I should (in part because 95+ percent of the time, I’m committing into one-man repos and use GitHub Desktop), but every time I do use it, I remember that it’s a nice CLI.

Comments closed