Press "Enter" to skip to content

Month: April 2023

Version Control for Power BI Datasets

Richard Swinbank improves on a prior version control system:

In the previous post, I outlined a possible workflow for Power BI development, and implemented an Azure DevOps pipeline to show how steps in such a workflow could be automated. To build the pipeline I stored an entire .pbix report file – data and all – in version control, which is a problem for at least two reasons:

  • storing large report files in a version control system won’t scale well
  • datasets may contain confidential or sensitive data which must be kept out of version control.

In this post I’ll look at separating a report’s dataset from its visuals, version controlling the standalone dataset (without data), and deploying the dataset automatically to Power BI.

Read on for the process.

Comments closed

CETAS to Parquet Files in Azure SQL Managed Instance

Michael Bourgon gives CETAS a chance:

TL;DR – the below lines will allow you to query a table on your MIcreating Parquet files in Azure blob storageAnd you can query it! Next up is partitioning over time, etc, etc. But this is freaking fantastic. I have a python script I wrote that does it, but it’s nowhere as nice/easy as this.

Why do you care? Because it’s a fantastically easy way to archive older data to blob storage, and I suspect (need to test) that if you do it right, you can then have it go to cool/archive storage via a lifecycle setup, so that if you need it much later, you can.

Yep, this is historically one of the best use cases for PolyBase. Unfortunately, we can’t do this in SQL Server 2022, though you can in pre-2022 versions using the Hadoop process. Given that it’s now available in SQL MI, I wouldn’t be too shocked to see it on-premises at some point, with the big question being in SQL Server 2022 or vNext.

2 Comments

Updating an Always Encrypted Column

Chad Callihan makes an update:

When recently troubleshooting an issue, I needed to update a database record to test application functionality. Because the table had an Always Encrypted column, some extra steps were needed to make the UPDATE successfully. Let’s look at the error encountered and how it was resolved.

Click through for the error and see how Chad got around the problem. This is definitely one of those head-scratcher solutions, where you can kind of understand why it’s necessary but still think the required process is dumb.

Comments closed

Optimizing Kafka Infrastructure Costs

Addison Huddy saves some money:

In this first blog, we’re going to run through the infrastructure costs of running Kafka—i.e., compute, storage, networking, and the additional tooling you need to keep Kafka up and running smoothly. We won’t bury the lede—if you’re running Kafka in the cloud across multiple AZs (as most do for high availability), networking likely represents over 50% of your Kafka infrastructure costs. Let’s see how this ends up being the case.

Click through for some thoughts on how to reduce network costs, using AWS as an example.

Comments closed

Removing a Node from Elasticsearch

The Big Data in Real World team spams the delete button:

Shutting down a node abruptly is not the right way to decommission or remove a node from the Elasticsearch cluster. Doing so will cause your shards which are replicated to go down in replication and it could cause disruption to the clients who are currently consuming data from Elasticsearch.

Proper way to decommission or remove a node from Elasticsearch is to add the host to the exclusion list.

Click through to learn how to do this.

Comments closed

Updating Power BI Dataset Compatibility Level

Kurt Buhler wants the newest toys:

In the monthly updates for Power BI, there may be new features that appear for preview. For example, in the April 2023 update, dynamic format strings for measures released into preview. This feature allows you to specify a DAX format string expression for measures, like you already could do with calculation groups.

In Power BI, these features become readily available once enabled from the ‘Preview features’ section of the ‘Options’ menu. However, when you are connected to a dataset or metadata with Tabular Editor, the properties will not be visible. That’s because behind-the-scenes, Power BI upgrades the model compatibility level when using a preview feature for the first time.

Click through to learn how.

Comments closed

ScriptDOM Now Open Source

Drew Skwiers-Koballa has great news for us:

ScriptDOM is a powerful .NET library for code parsing, generating an abstract syntax tree (AST) that can be leveraged to apply code formatting, detect antipatterns, and more. We are thrilled to announce that the source code for ScriptDOM has been released into open source under the MIT license and is available on GitHub.  In addition, ScriptDOM is now distributed by Microsoft as a standalone NuGet package.

This is big and good news. We’ve been able to use ScriptDOM for quite a while, but now that we can extend and improve the library, that’s great.

Comments closed

Understanding Log Send Queue, Redo Queue, and Redo Rate

Greg Dodd explains three terms:

A quick description of 3 metrics that SQL tracks in Availability Groups. These metrics are important when evaluating the health of your Availability Group, and knowing what sort of data loss you might face in a failover. Remember, just because you told SQL to make your AAG Synchronous, doesn’t mean you won’t have data loss.

Click through for the definitions.

Comments closed

Loading WhoIsActive Data on Azure SQL DB

Andrea Allred wants to know who’s doing what on this system:

I needed to collect sp_WhoIsActive into a table, but the twist was that it is on my Azure Managed Database, so I had to get creative with how I did it. We needed an Azure Pipeline to run it, but we wanted to record it every minute and firing a pipeline every minute adds up fast. So we decided that we would kick it off once an hour and have the process wait for a minute and then fire until the hour ended. Then it fire again at the top of the next hour and the same process would happen.

That’s an interesting way to do it. Another alternative might have been an Azure function app, which you could schedule to run every minute. I think that’d be a lot less expensive than running an Azure Pipeline, and this goes to show you that there are many ways to solve the same problem in Azure.

Comments closed

9 Gotchas Working with Postgres

Phil Booth categorizes various mistakes as learning experiences:

Previously on Extreme Learning, I discussed all the ways I’ve broken production using healthchecks. In this post I’ll do the same for PostgreSQL.

The common thread linking most of these gotchas is scalability. They’re things that won’t affect you while your database is small. But if one day you want your database not to be small, it pays to think about them in advance. Otherwise they’ll came back and bite you later, potentially when it’s least convenient. Plus in many cases it’s less work to do the right thing from the start, than it is to change a working system to do the right thing later on.

Click through for the nine lessons learned, eight of which are still relevant as of PostgreSQL version 12. Many of these also have analogues in the SQL Server world, e.g., don’t overuse triggers, use non-recursive methods for path traversal when possible, do add indexes on foreign keys.

Comments closed