Press "Enter" to skip to content

Curated SQL Posts

Connecting to a Fabric Warehouse via SSMS

Reitse Eskens does some digging:

Whilst working on a blogpost on Fabric Data Warehouse, I started wondering if I could work around the SQL web interface and connect to my OneLake with SSMS and/or ADS. As it turns out, you can!

Specifically, you can connect to see things in a warehouse or the Tables view of a lakehouse, not the Files view. There is a built-in web viewer, but Microsoft Fabric definitely is intended to work with normal SQL tools, not just its web interface and Power BI.

Comments closed

Thoughts on Community-Driven Documentation in Postgres

Robert Haas shares some thoughts:

In my opinion, the PostgreSQL documentation is simultaneously excellent and fairly poor, and both its excellence and its shortcomings are direct results of the process by which the documentation is produced. The PostgreSQL documentation is stored in the same git repository as the source code, and anyone who patches the source code so as to change documented behavior must also patch the documentation to match.

This means that nearly all documentation updates are made by the developer who is most familiar with what is changing in the code, or sometimes by another developer who has studied those changes closely. Therefore, the documentation is usually extremely accurate. Sure, there are oversights, but it would be incredible to discover that some PostgreSQL command has a documented option which doesn’t actually exist, or that a parameter which is documented to take a string argument actually takes an integer or a Boolean. Typically, the descriptions of what SQL statements do and how that behavior is changed by parameter settings or options passed to the command itself are crisp and precise.

But read the whole thing, as there are downsides to this approach.

Comments closed

Side Channel Attacks on Dynamic Data Masking

Ben Johnston does some testing:

This is the third part of a series on SQL Server Dynamic Data Masking. The first part in the series was a brief introduction to dynamic data masking, completing solutions, and use cases. The second part covered setting up masking and some examples. This part starts exploring side channel attacks against dynamic data masking.

This article will show that are plenty of security concerns with using Dynamic Data Masking as a general-purpose security tool, but don’t let it completely keep you from using it. The point of the article is to show you the weakness of the tool, particularly for ad-hoc uses with users who might be keen to snoop around the data in ways you might not expect.

I’m not very big on Dynamic Data Masking at all, and this is a big part of why. That said, if your end users don’t have the ability to send arbitrary queries along (e.g., they can only execute stored procedures and none of those stored procedures let you send in arbitrary T-SQL), then it’s not that bad.

Comments closed

Methods for Monitoring MongoDB

Hadi Fadlallah does some watching:

Several utilities included with the MongoDB distribution provide statistics about instances’ performance and activity. Usually, they are used to diagnose problems and assess the functioning of a system.

The MongoDB database tools are a suite of command-line utilities for working with MongoDB. Starting with MongoDB 4.4, these tools are released separately from the MongoDB Server. To install them, we should follow the guidelines provided in the following documentation.

Click through for information on command-line tools, built-in web services, and one third-party tool for the job.

Comments closed

Simplifying Model Formulas in R

Steven Sanderson shows off a command:

As a programmer, you may come across various scenarios where you need to create complex model formulas in R. However, constructing these formulas can often be challenging and time-consuming. This is where the ‘reformulate()’ function comes to the rescue! In this blog post, we will explore the purpose and usage of the reformulate() function in R, and provide you with simple examples to help you grasp its power.

This is where I want to put on glasses so I can push the glasses up my nose and say “Well, ackshually, the examples here are pre-formulating instead of re-formulating…”

Silliness on my part aside, click through to see how the function works and how it can work to simplify your regression analyses.

Comments closed

Clearing Backup History on SQL Server

Rich Benner reminds us to clear that backup history:

One thing we regularly come across with a new client is large system databases. Something that SQL Server should do but doesn’t is clearing up system tables. One of the more common issues is a very large backup history which causes bloat in the msdb database. Let’s find out how to clear backup history.

These tables store information about backups that you have taken, which is extremely useful … for a time. After that useful time, it’s just unnecessary data that is taking up space on your server. There’s a number of related tables that store backup history, and it’s not simple to delete data from them all. Most don’t have dates stored in the table so you have to join them together and make sure you delete in the correct order.

Read on for a built-in procedure, as well as a warning. It’s interesting to see, sometimes, just how much space msdb is taking up with stuff people don’t realize. Backup history is usually one of the offenders, along with SQL Agent history and (especially on pre-2016 instances) SSIS history.

Comments closed

Computed Columns in Snowflake

Kevin Wilkie does the math:

Sometimes to make our lives easier, we, as database engineers, can create a table that automatically tells us the answer as we need it – or at least how we tell it we want it. In SQL Server, we create what is called “Computed Columns.”

Read on to see how to create one of these in Snowflake.

Comments closed

Running Queries with Always Encrypted

Matthew McGiffen retrieves some data:

In this post we’ll look at how you interact with data that is encrypted using Always Encrypted. The examples here will show how you run queries from SSMS, in later posts we’ll look at stored procedures and application code – as well as what happens in the background when you execute a query.

For these examples we’re going to be using the database we created in the last post.

Always Encrypted definitely changes the way you work with those encrypted columns, and you might run into some frustrating errors along the way, as things you could get away with before are no longer possible.

Comments closed

DirectQuery Data Modeling

Jason Cockington share some advice:

From my experience, most people who have reports built on a DirectQuery connection into their data source did so because of a lack of understanding of what the DirectQuery connection was designed to achieve.  For the vast majority of reports, Import mode is the best solution for working with data in Power BI.  DirectQuery should really only ever be applied when you are trying to solve one of the following challenges.

  1. Real-time Data – you need to see the latest available data from the source
  2. Huge Datasets – you have many billions of rows of data (more than 10Gb) so you just can’t import it into Power BI
  3. Regulatory Compliance – the data must stay in the source for data security/privacy reasons

Click through for more information.

Comments closed

A Primer on SQL Audit

Deepthi Goguri gives us an overview of Azure SQL Database’s built-in auditing functionality:

As you all know how crucial it is to Audit activity on the Server for both prod and non-prod environments, turning on the auditing in Azure SQL is pretty simple and the results we see in the audit log are similar to the logs we see on-prem. The difference is where we save the audit data in Azure.

You can enable the auditing at the Server level and at the database level just like the way we can audit SQL Server on-prem. If you would like to enable audit at the Server level in Azure, it will automatically audit all the databases under that Server. If we allow the auditing at the server level (logical Server for Azure SQL Databases) and also at the database level, we might get double the amount of collected audit data as it contains the same data twice. Always chose the Storage account if you wanted to audit the data at the Server level. If you just want to collect the audit data on one or some databases only, you can disable the logical Server level audit and enable the Auditing at the database level.

Read on for more information and to see a bit of it in action.

Comments closed