Press "Enter" to skip to content

Curated SQL Posts

Finding Basic Table Information via T-SQL

Andy Brownsword has a script for us:

In Management Studio we can view object details by hitting F7 in Object Explorer. It gives us basic metrics but I find it very slow to load for the details I typically need.

For that reason I though I’d share a script to turn to for metrics I commonly need. This query returns:

  • The table details (schema, name, created date)
  • The primary storage (Heap, Clustered, or Columnstore)
  • The numbmer of Nonclustered / Columnstore Indexes
  • The number of records and rough size for data / indexes

Click through for the script and an example of what it looks like.

Comments closed

SQL Server Msg 3023 during DBCC SHRINKFILE

Tom Collins gets an error:

Question : Executing the following  Database shrinkfile activity and getting the error message

use myDatabase
DBCC SHRINKFILE (N’myDatabase_log’ , 0, TRUNCATEONLY)

Msg 3023, Level 16, State 2, Line 4
Backup, file manipulation operations (such as ALTER DATABASE ADD FILE) and encryption changes on a database must be serialized. Reissue the statement after the current backup or file manipulation operation is completed.

Read on for the answer.

Comments closed

Shrinking Large Tables in Postgres

Andrew Atkinson shrinks a table:

In this post, you’ll learn a recipe that you can use to “shrink” a large table. This is a good fit when only a portion of the data is accessed, the big table has become unwieldy, and you don’t want a heavier solution like table partitioning.

This recipe has been used on tables with billions of rows, and without taking Postgres offline. How does it work?

Click through to find out.

Comments closed

Real-Time Streaming in Azure

Temidayo Omoniyi takes us through an architecture:

In today’s world, billions of data are generated daily from messaging applications like WhatsApp, financial data like the New York Stock Exchange, or video streaming platforms like YouTube. As a data engineer or solution architect, you are tasked to design a real-time streaming platform that captures the data as they are generated and stored in the necessary storage for decision-making.

This does a great job of going into detail, not only at the architectural level, but also setup and practical implementation.

Comments closed

Vacuum, MERGE, and ON CONFLICT Directives

Shane Borden looks back at a directive:

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive

Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge.

Read on to see how things look now.

Comments closed

Querying a Fabric SQL Endpoint via Notebook and T-SQL

Sandeep Pawar talks about a Spark connector:

I am not sharing anything new. The spark data warehouse connector has been available for a couple months now. It had some bugs, but it seems to be stable now. This connector allows you to query the lakehouse or warehouse endpoint in the Fabric notebook using spark. You can read the documentation for details but below is a quick pattern that you may find handy.

Despite it not being anything new, it is still interesting to see the use case of writing T-SQL instead of Spark SQL.

Comments closed

Data Type Changes in Snowflake

Kevin Wilkie makes some changes:

When working with data, I usually have an idea of what type of data I will push into a field. Sometimes, for whatever reason, it is decided to change the type of data allowed in the field. Today, I want to show how that’s done in Snowflake.

Click through to learn how, and how it’s not quite the same as SQL Server.

Comments closed

Exploring Azure Data Storage Options

Anurag K talks about data storage:

Choosing the right data storage option in Azure is critical for ensuring your applications run efficiently, securely, and cost-effectively. In this blog post, we’ll dive deeper into three key Azure storage services: Blob Storage, File Storage, and Disk Storage. We’ll explore their features, use cases, and provide specific examples to help you understand how to best leverage these services in your cloud environment.

I would note here that within the Blob storage section, you could carve out an explanation of Data Lake Storage Gen2, which starts from Blob storage and then adds hierarchical namespaces (i.e., folders).

Comments closed

Testing Python Code with pytest

Aida Gjoka builds some tests:

Testing code using automated tools is common throughout the software development industry. This technique can improve the quality of the code you write as a data scientist. Testing helps refine your code, supports redesign, prevents errors, and makes it harder to write single-use code.

Here, we introduce the pytest framework and show how it can be used to test Python functions. If you don’t use a testing framework as part of your daily workflow, try experimenting with the techniques presented here the next time you write or extend a function.

I am a big fan of pytest because it strikes what I consider to be a great balance between convention and customization. There’s very little administrative overhead to creating test classes and test cases, so tests are easy to build and can it’s trivial to run a test suite or a specific part of one.

Comments closed