Storage – Page 6 – Curated SQL

Measuring Write Speeds in SQL Server

Published 2024-03-05 by Kevin Feasel

In this post I cover a script I’ve put together for measuring storage write speeds in SQL Server, namely against database data files.

This is meant to help get an idea of how the underlying storage performs when SQL Server is writing 1GB of data to a database.

At this point, you might be asking yourself: “Why not use CrystalDiskMark instead?”.
The answer is simple: you might not always be able to install/run additional software in an environment. Even more so if you work with external customers or you’re a consultant. It’s a lot simpler to ask a customer to run a script and send you the output, than it is to ask them to install and run some 3rd party software.

Click through for the script, what it does, and how to run it, as well as a note on limitations and example based on three drives.

Comments closed

The Importance of Data Retention Policies

Published 2024-01-22 by Kevin Feasel

Ed Pollack shares some great advice:

It is always an afterthought. New objects are created that start off small and current. New feature development takes over and the recently architected data structures become old news. Over time, data grows and suddenly a previously small table contains millions or billions of rows.

Is all that data necessary? How long should it be retained for? If there is no answer to this question, then the actuality may be “Forever”, or more honestly “No one knows for sure.”

Retention takes on many forms and this article dives into ways in which data can be managed over time to ensure that it is fast, accurate, and readily available.

We don’t tend to think about data retention in the development phase, but it’s an important consideration and thinking about it up-front might save you disk space headaches later.

Comments closed

Concatenating Many Files in Azure Blob Storage

Published 2023-12-20 by Kevin Feasel

Drew Furgiuele concatenates a lot of files:

Lately, I’ve found myself with a few requests from friends and users that have a particular problem: they’ve got themselves a data lake in Azure, and they can read and write files just fine to it. The problem, though, is that sometimes they need to take a series of files and mash them all together, or as the cool kids call it: concatenate them. And when it comes to third party tools and methods that can do the trick, you’re spoiled for choice: Azure Data Factory, Spark via Databricks, or even PowerShell.

Case in point: I was working with someone who had tens of thousands of CSV files that they needed to merge together into one big file, but they were already out in their Azure storage account. That doesn’t sound so bad, does it?

Drew explains why it is, but also why it isn’t. So click through and check that out.

Comments closed

Diagnosing SQL Server Storage Performance

Published 2023-12-01 by Kevin Feasel

Andy Yun has a script for us:

As part of my PASS Summit 2023 community session, I created a hybrid of two of my favorite sys.dm_io_virtual_file_stats scripts. It captures data from the DMV at two different intervals, to help you diagnose SQL Server I/O related performance data.

Click through for more details and to get a copy of the script.

Comments closed

Differentiating Physical and Logical Reads in SQL Server

Published 2023-11-13 by Kevin Feasel

Jose Manuel Jurado Diaz explains a concept:

In the realm of Azure SQL Database, query performance is a paramount concern for database administrators and developers alike. A critical aspect of this performance is understanding how SQL Server interacts with data, particularly through physical and logical reads. This article delves into these two fundamental concepts, providing insights into their impact on database performance and a practical lab to observe these metrics in action.

Read on for the difference, as well as a demonstration. With slow disks and insufficient RAM, it’s really important to know this difference. But as you have more RAM and move to formats like NVMe for storage, I’d argue that it’s less of an issue. The additional RAM, in particular, is important because the idea is that data access frequently will remain in the buffer pool for longer, so you’re more likely to see logical reads in action. Of course, poor indexing and bad decisions can ruin that idea, so don’t do that, okay?

Comments closed

Whitepapers for Oracle and SQL Server in Azure

Published 2023-10-19 by Kevin Feasel

Kellyn Gorman has been busy:

I’ve been pretty busy with work and travel, but I finally got an official Silk Github repository to publish a couple new white papers and sizing assessment worksheets for customer access. These are primarily Oracle and SQL Server to Azure focused white papers, but I will be publishing ones on GCP next, to be followed by AI and other database platforms soon.

Click through for links to the documents.

Comments closed

Common SQL Server Mistakes: Default Auto-Growth

Published 2023-10-10 by Kevin Feasel

Hemantgiri Goswami takes a look at auto-growth:

Auto Growth is a feature that allows database files (primary, secondary, and log) to expand when the database file becomes full – without manual intervention.

Auto Growth feature is handy when we do not want to increase the size of database files manually. There are two ways you can set auto growth – using SQL Server Management Studio (SSMS hereafter) and T-SQL. Auto Growth can be configured – In Percent and Megabytes.

Auto-growth isn’t a problem on its own, though growth sizes, especially in older versions of SQL Server, were far too low for medium- and large-sized databases.

I don’t particularly like the 2.5 MB example Hemantgiri shows. I have a quick rule of thumb which is 64MB for small databases, 256-512 for medium-sized databases, and 1GB for large databases (assuming my underlying disk is fast). This limits the number of auto-growth events and, for log files in particular, keeps virtual log file counts more reasonable.

Comments closed

Tiered Storage in Apache Kafka

Published 2023-10-02 by Kevin Feasel

Matthew de Detrich explains the value behind tiered storage in Apache Kafka:

Tiered Storage is arguably one of the most sought-after features of Kafka 3.6, allowing Kafka’s core data to be stored in other locations, such as object storage, in addition to hard disks in a transparent manner, without any changes to Kafka’s producers or consumers. The Kafka brokers control whether the data is stored on local disks, fast but expensive and limited, or in alternative storage places, such as Amazon S3. When Tiered Storage is properly configured, it means you can have the best of both worlds: recent data is stored on local fast disks (as is currently), and older, less frequently accessed data can be stored elsewhere where it’s cheaper and space requirements are less of a concern (sometimes unlimited!)

Read on to learn more about the official version of tiered storage, as well as a forward port of two prior implementation attempts to Kafka version 3.3.

Comments closed

Azure Blob Storage Operating System Error 86

Published 2023-09-20 by Kevin Feasel

Jose Manuel Jurado Diaz 86’d that option:

Today, I worked on a service request that our customer got the following error message: Cannot open backup device ‘https://XXX.blob.core.windows.net/NNN/YYY.bak‘. Operating system error 86(The specified network password is not correct.). RESTORE HEADERONLY is terminating abnormally. (Microsoft SQL Server, Error: 3201). Following I would like to share with you some details why this issue and the activities done to resolve it.

Read on to get a better understanding of what this error actually means and how you can fix it.

Comments closed

Choosing a Data Serialization Format

Published 2023-09-08 by Kevin Feasel

Rathish Kumar says more than “JSON and Parquet”:

In the world of software, we often work with different types of data like lists, tables, and more. These data structures are designed to be fast and efficient when our computer programs use them. However, sometimes we need to move this data out of our computer’s memory, like when we want to save it to a file or send it over the internet. To do this, we have to change the data into a special format made up of 0s and 1s, which is quite different from data structures. This process is what we call encoding or serialization.

In this article, we’ll explore the world of encoding and decoding, which is the reverse process of turning that special format back into usable data. We’ll also take a look at different ways to do encoding and decoding, as well as important things to think about when choosing the right method for your software projects.

Sadly, ORC (Optimized Row Columnar) doesn’t make the cut, as Parquet ended up taking over that market.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Storage