Press "Enter" to skip to content

Category: Storage

Querying Database and Log File Sizes with T-SQL

Allen White takes us through an easy technique to check database and log file sizes:

As a consultant, I have to be able to quickly spot problems, and one of the problems I frequently find is transaction log files that are incorrectly sized.

There are two catalog views in the master database which make this easy to do – sys.master_files and sys.databases. The sys.master_files view contains the database and individual file names, and the data_space_id column always has a value of 0 for the log file. The size column returns the value in 8KB pages, so we have to multiply the column by 8, then divide by 1024 to get the size in megabytes (MB).

Click through for the demo.

Comments closed

Bulk Migration of Data and Log Files in SQL Server

David Fowler shows how you can change file paths for all of your databases in one fell swoop:

You’ve got a SQL Server with a few hundred databases on it (to be honest it doesn’t even need to be quite that many) and you need to move all the data and log files to a new location. Perhaps you’re going to be migrating onto a new, shiny SAN or maybe your disks are just about full and you need to shift a bunch of the files off somewhere else.

The first thing that you’re going to need to do is change the paths of the files in SQL. That’s easy enough to do with an ALTER DATABASE statement.

ALTER DATABASE SQLUndercover MODIFY FILE (NAME SQLUndercover_Log, FILENAME = 'F:\SQLLogs\SQLUndercover_Log.ldf'

But that’s going to get very tedious very quickly if you’ve got to do that for a whole lotta databases. So to help out, I thought I’d share a little script that I’ve been using for a while (or a variation on it at least) to make the process far easier and generate all the ALTER statements for you.

Click through for the script as well as a bit of advice around the actual moving of the files.

Comments closed

Loading Data into Delta Lake

Prakash Chockalingam takes us through auto-loading Delta Lake from various sources:

Auto Loader is an optimized file source that overcomes all the above limitations and provides a seamless way for data teams to load the raw data at low cost and latency with minimal DevOps effort. You just need to provide a source directory path and start a streaming job. The new structured streaming source, called “cloudFiles”, will automatically set up file notification services that subscribe file events from the input directory and process new files as they arrive, with the option of also processing existing files in that directory.

This does look interesting.

Comments closed

Storing Power BI Audit Logs in Blob Storage

Gilbert Quevauvilliers works around a built-in constraint with Power BI Audit Logs:

With the new Power BI Get-PowerBIActivityEvent I wanted to find a way where I could automate the entire process where it all runs in the cloud.

One of the current challenges with the Audit logs is that they only store 90 days, so if you want to do analysis for longer than 90 days the log files have to be stored somewhere. Why not use Azure Blob Storage?

Whilst these steps might appear to be rather technical if you follow them and you have access to an Azure Subscription you can do this too.

Gilbert warns us up-front that this will be a lengthy post and that is quite true. But if you need to hold those audit logs more than 90 days, this is a great way of doing so.

Comments closed

Finding the Right Disk and Data Node Sizes in HDFS

Lokesh Jain has some advice when it comes to disk and data node size:

There are two factors to keep in mind when choosing node capacity. These will be discussed in detail in the next sections.

1. Large Disks – total node capacity being the same, using more disks is better as it yields higher aggregate IO bandwidth.
2. Dense Nodes – as nodes get denser, recovery after node failure takes longer.

These factors are not HDFS-specific and will impact any distributed storage service that replicates data for redundancy and serves live workloads.

Click through for specific advice on maximum disk and node sizes.

Comments closed

Tips for Using Azure Storage

James Serra takes us through Azure Data Lake Store Gen2 and Azure Blob Storage:

Azure Data Lake Store (ADLS) Gen2 should be used instead of Azure Blob Storage unless there is a needed feature that is not yet GA’d in ADLS Gen2.

The major features that are missing from ADLS Gen2 are premium tiersoft deletepage blobsappend blobs, and snapshots. The major features that are in preview are archive tierlifecycle management, and diagnostic logs. Check out all the missing features at Known issues with Azure Data Lake Storage Gen2.

Note that underneath the covers, ADLS Gen2 uses Azure Blob Storage and is simply a layer over blob storage providing additional features (i.e. hierarchical file system, better performance, enhanced security, Hadoop compatible access).

Click through for a bullet point list of useful information.

Comments closed

Creating a Gen-2 Azure Data Lake Store

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Comments closed

Testing SQL Server Storage Performance

Brent Ozar walks us through CrystalDiskMark 7 to check whether storage speed is up to snuff:

The Peak Performance + Mix setting runs a pretty cool mix of tests that will push your storage hard. Note that I don’t try to get CDM to replicate exactly how SQL Server does IO: I’m just trying to get a quick 5-minute idea of whether my storage is hot or not.

Click through for the demo. Mind you, this is something you want to do before setting up SQL Server…

Comments closed

SQL Server on Azure: Performance Optimized Storage Config

Mine Tokus announces a new feature when using Azure to host IaaS SQL Server instances:

Today, we are excited to announce Performance Optimized Storage Configuration capabilities for the VM’s registered with SQL VM RP. This feature automates storage configuration according to performance best practices for SQL Server on Azure virtual machines through Azure Portal or Azure Quick start Templates when creating a SQL VM. Automated performance best practices include separating Data and Log filescache configuration for premium disks hosting data and log filessupport for Temp DB on local disksupport for Ultra disks to host data, log or Temp DB files and database engine only images. In this article, we will discuss each automated performance best practice in detail.

Read on for the description and check out those links for additional information.

Comments closed

ADLS Gen2 Navigation in Power Query

Chris Webb shows off hierarchical navigation in Power Query against Azure Data Lake Storage Gen2:

While the documentation on how to import data from Azure Data Lake Gen2 Storage into Power BI is pretty detailed, the connector (which at the time of writing is in beta) that supports this functionality in the Power Query engine has some useful functionality that isn’t so obvious. If you look at the built-in documentation on the AzureStorage.DataLake M function in the Power Query Editor you’ll see there are a lot of options that aren’t in the documentation on the web yet:

Click through for an example.

Comments closed