Press "Enter" to skip to content

Month: September 2021

Deleting Duplicate Records

Chad Callihan shows one of the best ways to remove duplicate records from a table:

Have you ever needed to delete duplicate records from a table? If there is only one duplicate in a table then simply running something like DELETE FROM Table WHERE ID = @DupRecord will do the trick. What happens when the problem is found after multiple records are duplicated? Will tracking them all down be more time consuming? Here are a few different options for quickly clearing out duplicate records.

There’s the best way, and then there are the other ways.

Comments closed

Type 1 SCDs in Delta Lake

Chris Williams starts a series on slowly changing dimensions in a Delta Lake:

Anyone that has contributed towards a Data Warehouse or a dimensional model in Power BI will know the distinction made between the time-series metrics of a Fact Table and the categorised attributes of a Dimension Table. These dimensions are also affected by the passage of time and require revised descriptions periodically which is why they are known as Slowly Changing Dimensions (SCD). See The Data Warehouse Toolkit – Kimball & Ross for more information.

Here is where the Delta Lake comes in. Using its many features such as support for ACID transactions (Atomicity, Consistency, Isolation and Durability) and schema enforcement we can create the same durable SCD’s. This may have required a series of complicated SQL statements in the past to achieve this. I will now discuss a few of the most common SCD’s and show how they can be easily achieved using a few Databricks Notebooks, which are available from my GitHub repo so you can download and have a go:

https://github.com/cwilliams87/Blog-SCDs

Check out the repo, but be sure to read the whole post.

Comments closed

Adjusting Bar Widths in Excel

Elizabeth Ricks explains how you can change the width of a bar in Excel:

Before we get into the step-by-step, I should mention that there aren’t any strict rules for optimal spacing between bars. Rather, it’s personal preference similar to wearing white after Labor Day (in the U.S., that’s the first weekend in September). As a resident of the muggy Southeast, I’ll be rocking white until fall temperatures arrive in mid-October. However, if you live in cooler climes and consider Labor Day the symbolic end of summer, your preference might be to say sayonara to white until Memorial Day. 

The same gray area goes for optimal spacing between bars. The actual width is not set in stone. Our goal is to enable our audiences to compare the lengths of the bars (instead of the area between them), so general guidance is to thicken the bars to minimize the surrounding white space.

Click through for the process.

Comments closed

Reviewing Encryption Code Snippets from Stack Overflow

Scott Contini explains some problems with popular encryption code snippets on Stack Overflow:

I’m not doing this to shame those who have made mistakes: Instead, I want to do my part to help fix the problem. As an AppSec specialist, I get really tired of having the same discussions over and over. I try real hard to make it easy for people to do the right thing: I point them to code that is safe to use, such as Luke Park’s Secure Compatible Encryption Examples. Despite this, there are the occasional teams who just continue to resist, even before the code has made it to production which is the best time to fix it. This makes everybody’s lives more difficult: it wastes my time to have to explain to them why their code is wrong, and it forces the teams to have to do a lot more work later because once the bad cryptography is in production, they need a migration plan to fix it.

Read on for a great explanation of common problems in encryption code.

Comments closed

Monitoring Azure Data Factory, Integration Runtimes, and Pipelines

Sandeep Arora monitors all the things:

For effective monitoring of ADF pipelines, we are going to use Log Analytics, Azure Monitor and Azure Data Factory Analytics. The above illustration shows the architectural representation of the monitoring setup.

The details of setting up log analytics, alerts and Azure Data Factory Analytics are further discussed in this section.

If you manage Azure Data Factory in your environment, give this a read.

Comments closed

Moving Azure Purview Sources between Collections

Wolfgang Strasser has a future review of past activities:

Yesterday, Andy Cutler (t) asked me on twitter, if the move of a registered source between collections is possible.

My first answer was, yes sure… And I also included a screenshot from one of my Purview accounts (which – this will be important in the following – is a legacy Purview account).

But – tech isn’t tech without some differentiation between versions and available options

Read the whole thing.

Comments closed

The Final Service Pack for SQL Server

Pedro Lopes announces the last service pack ever:

The 3rd and final Service Pack release for SQL Server 2016 is now available for download at the Microsoft Downloads site. This is also the last Service Pack for any SQL Server version, as previously announced in the Modern Servicing Model for SQL Server. Please note that registration is no longer required to download.

The cynic in me says “This is the final service pack ever, at least until they re-introduce them in five years under a slightly different name because people keep waiting for CU10 to drop before thinking about migrating to the latest version of the product.”

1 Comment

Azure Linux VM Agent Vulnerability

Nir Ohfeld finds another vulnerability:

Wiz’s research team recently discovered a series of alarming vulnerabilities that highlight the supply chain risk of open source code, particularly for customers of cloud computing services.

The source of the problem is a ubiquitous but little-known software agent called Open Management Infrastructure (OMI) that’s embedded in many popular Azure services.

When customers set up a Linux virtual machine in their cloud, the OMI agent is automatically deployed without their knowledge when they enable certain Azure services. Unless a patch is applied, attackers can easily exploit these four vulnerabilities to escalate to root privileges and remotely execute malicious code (for instance, encrypting files for ransom).

This has been patched, but it’s really ugly. H/T Ben Stegink.

Comments closed

What is Pandas?

Lina Kovacheva starts a new series on Pandas:

First and foremost – what is Pandas?

Pandas is a popular Python library that allows users to easily analyse and manipulate data. It offers powerful and flexible data structures and is vastly popular among data scientists and analysts. As with any other library to be able to use Pandas you have to import the library. 

Click through to learn more.

Comments closed

Role-Based Access Control in Snowflake

Warner Chaves explains how role-based access controls work in Snowflake:

The data access privilege granularity is the lowest level of securable that you will use to provide data access. This can theoretically go all the way down to rows and all the way up to full databases. 

I usually recommend that people start out with using Schema as their data access securable granularity. Database is usually too broad and you will inevitably have to re-do your roles and table level. Below is too specific to turn it into a general methodology—you would end up with way too many roles. See the FAQ later in this post on how to mix and match granularities if needed.

Once you have the granularity defined, you then create back-end roles at that level.

Read on to see what those roles look like. It’s a pretty standard RBAC setup.

Comments closed