Press "Enter" to skip to content

Day: June 15, 2023

Schema Optimization and Disk Usage in Cassandra

John Del Castillo has an after-action report for us:

Instaclustr’s automated systems are constantly monitoring the growth of Instametrics, and periodically it reaches a threshold where TechOps determines it requires more storage. 

When this happens, they contact the Engineering team to get permission to add more nodes. 

On one of these occasions, our team of experts in TechOps looked more closely at our usage of Cassandra and how it has changed over the years, to see if there were ways to further optimize it. 

And we found a way to reduce our disk usage by over 20%, with just a change to our data schema. 

Read on to see what they did and how they were able to save a good amount of disk space.

Comments closed

Joins in KQL

Robert Cain picks back up on a series:

I’m still working on my ArcaneBooks project, mostly documentation, so I thought I’d take a quick break and go back to a few posts on KQL (Kusto Query Language). In this post we’ll cover the join operator.

join in KQL operates much as it does in SQL. It will join two datasets together into a single result.

Even so, there’s a little more to joins in KQL than in T-SQL, with innerunique being unique to KQL. The closest T-SQL analog would probably be a CROSS APPLY (SELECT TOP(1) ...) operation. KQL also uses join to combine the concepts of EXISTS and NOT EXISTS in SQL.

Comments closed

Microsoft Now Offering Replacement Hardware for Surface Devices

Laurent Giret shares the good news;

Starting today, Microsoft is now offering replacement components for Surface devices in its Microsoft Store. This should be great news for consumers with Surface devices that are out of warranty, and Microsoft is also offering free repair guides for its various Surface devices.

“While we have always offered world-class warranty and repair services via Microsoft support, we have been working to increase repair options by designing products that are easier to repair and by expanding our network of Authorized Service Providers. As part of this larger initiative, we are excited to offer replacement components to technically inclined consumers for out-of-warranty, self repair,” announced Tim McGuiggan, VP Devices Services & Product Engineering at Microsoft.

I’m big enough on right-to-repair that I’ll even mention it here. That said, the prices are, uh, a bit high. Compare to Framework’s prices for their parts and I guess the best advice is still, “don’t break your stuff.” But I’m happy to see Microsoft make these parts available and put out the guides. For decades, there was a norm that you released the tech specs and made repair parts available, and any step we see toward moving back to that norm is something I’m happy with.

Comments closed

Hard-Coding Values in T-SQL Queries

Matthew McGiffen defends a practice:

If someone else comes along to look at this code they don’t know what the value of zero means. My code’s not clear. Worse I might refer to the same value multiple times in my procedure, so if I need to change it later I have to change it in multiple places.

Good practice from other coding languages would say that I replace it with a meaningfully named constant. As mentioned, in T-SQL we don’t have constants so I’ll compromise and use a variable:

And that good idea is where everything goes off the rails. There are a few places where T-SQL offers you “close enough” to other programming languages but where performance suffers as a result: ‘constant variables” like in this example, nested views and user-defined functions for encapsulation, etc.

Comments closed

Connecting to a Fabric Warehouse via SSMS

Reitse Eskens does some digging:

Whilst working on a blogpost on Fabric Data Warehouse, I started wondering if I could work around the SQL web interface and connect to my OneLake with SSMS and/or ADS. As it turns out, you can!

Specifically, you can connect to see things in a warehouse or the Tables view of a lakehouse, not the Files view. There is a built-in web viewer, but Microsoft Fabric definitely is intended to work with normal SQL tools, not just its web interface and Power BI.

Comments closed

Thoughts on Community-Driven Documentation in Postgres

Robert Haas shares some thoughts:

In my opinion, the PostgreSQL documentation is simultaneously excellent and fairly poor, and both its excellence and its shortcomings are direct results of the process by which the documentation is produced. The PostgreSQL documentation is stored in the same git repository as the source code, and anyone who patches the source code so as to change documented behavior must also patch the documentation to match.

This means that nearly all documentation updates are made by the developer who is most familiar with what is changing in the code, or sometimes by another developer who has studied those changes closely. Therefore, the documentation is usually extremely accurate. Sure, there are oversights, but it would be incredible to discover that some PostgreSQL command has a documented option which doesn’t actually exist, or that a parameter which is documented to take a string argument actually takes an integer or a Boolean. Typically, the descriptions of what SQL statements do and how that behavior is changed by parameter settings or options passed to the command itself are crisp and precise.

But read the whole thing, as there are downsides to this approach.

Comments closed

Side Channel Attacks on Dynamic Data Masking

Ben Johnston does some testing:

This is the third part of a series on SQL Server Dynamic Data Masking. The first part in the series was a brief introduction to dynamic data masking, completing solutions, and use cases. The second part covered setting up masking and some examples. This part starts exploring side channel attacks against dynamic data masking.

This article will show that are plenty of security concerns with using Dynamic Data Masking as a general-purpose security tool, but don’t let it completely keep you from using it. The point of the article is to show you the weakness of the tool, particularly for ad-hoc uses with users who might be keen to snoop around the data in ways you might not expect.

I’m not very big on Dynamic Data Masking at all, and this is a big part of why. That said, if your end users don’t have the ability to send arbitrary queries along (e.g., they can only execute stored procedures and none of those stored procedures let you send in arbitrary T-SQL), then it’s not that bad.

Comments closed

Methods for Monitoring MongoDB

Hadi Fadlallah does some watching:

Several utilities included with the MongoDB distribution provide statistics about instances’ performance and activity. Usually, they are used to diagnose problems and assess the functioning of a system.

The MongoDB database tools are a suite of command-line utilities for working with MongoDB. Starting with MongoDB 4.4, these tools are released separately from the MongoDB Server. To install them, we should follow the guidelines provided in the following documentation.

Click through for information on command-line tools, built-in web services, and one third-party tool for the job.

Comments closed