Press "Enter" to skip to content

Category: Storage

Blob Storage Account Lifecycle Maintenance

Andy Brownsword deletes some files but wants to keep other files:

A hierarchy of directories which contain files. That’s how we typically think about file storage. That’s not quite the same everywhere. In Blob Storage a file can appear to be in a directory, but when it’s removed so is the directory.

This can occur when using Lifecycle Management to help purge legacy blobs, which can be unexpected. Let’s look at a way we can help remediate this.

One important thing to remember about Azure blob storage accounts and S3 buckets is that there’s really no concept of a directory structure. It’s all keys, where your key might be dir1/dir2/dir3/file.txt. This is a bit different for Azure Data Lake Storage Gen2 and its notion of hierarchical namespaces (i.e., folders). But Andy does walk through some of the consequences of this and how to work with lifecycle maintenance policies to delete only certain sets of files.

Comments closed

Filesystem Access for Database Restoration via dbatools

Andy Levy shares a lesson learned:

While performing an instance migration this spring, I happened upon something I didn’t expect in [dbatools](https://dbatools.io/). It should have been a simple backup/restore copy of the databases, with the backup files residing on a fileshare on the destination server after being copied there. I kept getting a warning that the backup files I was attempting to restore couldn’t be read, and the restores (via Restore-DbaDatabase) wouldn’t execute.

I checked permissions on the server over and over again. Both on the filesystem and for the share that I was attempting to read from. Even more curious, if I executed the restore database statements directly from within Management Studio, the databases restored without issue.

After doing quite a bit of digging, I managed to find the reason.

Read on to learn more about necessary permissions, as well as the issue Andy hit, as well as the solution.

Comments closed

Parquet Files in Pandas

Chris LaGreca works with Parquet files:

Apache Parquet has become one of the defacto standards in modern data architecture. This open source, columnar data format serves as the backbone of many high-powered analytics and machine learning pipelines, supported by many of the worlds most sophisticated platforms and services. AWS, Azure, and Google Cloud all offer built-in support for Parquet while big data tools like Hadoop, Spark, Hive, and Databricks natively support Parquet, allowing seamless data processing and analytics. Parquet is also foundational in data lakehouse formats like Delta Lake, Iceberg, and Hudi, where its features are further enhanced.

Parquet is efficient and has broad industry support. In this post, I will showcase a few simple techniques to demonstrate working with Parquet and leveraging its special features using Pandas.

Pandas does make this rather easy, as Chris shows.

Comments closed

RIP Stretch DB

Debbi Lyons calls it:

Ever since Microsoft introduced SQL Server Stretch Database in 2016, our guiding principles for such hybrid data storage solutions have always been affordability, security, and native Azure integration. Customers have indicated that they want to reduce maintenance and storage costs for on-premises data, with options to scale up or down as needed, greater peace of mind from advanced security features such as Always Encrypted and row-level security, and they seek to unlock value from warm and cold data stretched to the cloud using Microsoft Azure analytics services.     

During recent years, Azure has undergone significant evolution, marked by groundbreaking innovations like Microsoft Fabric and Azure Data Lake Storage. As we continue this journey, it remains imperative to keep evolving our approach on hybrid data storage, ensuring optimal empowerment for our SQL Server customers in leveraging the best from Azure.

This is not surprising at all, considering that the premise of Stretch DB was that you could off-load old and less-important data from your local SQL Server instances and expensive local disk into Azure, querying it when you need that data. The problem was, you couldn’t use cheap storage and pay a few cents per gigabyte of data per month. Instead, you were effectively spinning up Azure Synapse Analytics and paying a marked premium for your least important data. The price alone made this an untenable idea, but there were other holes in the plan as well that doomed it as a product.

Comments closed

Parallel Download in Oracle Object Storage

Brendan Tierney continues a series on Oracle Object Storage:

In previous posts, I’ve given example Python code (and functions) for processing files into and out of OCI Object and Bucket Storage. One of these previous posts includes code and a demonstration of uploading files to an OCI Bucket using the multiprocessing package in Python.

Building upon these previous examples, the code below will download a Bucket using parallel processing. Like my last example, this code is based on the example code I gave in an earlier post on functions within a Jupyter Notebook.

Click through for the code.

Comments closed

Export Azure SQL DB to Blob Storage

Josephine Bush runs an import-export business and wants a database to “fall off a truck”:

After a data migration, we needed to decommission the old Azure SQL DBs, but we wanted to keep a copy in case we needed anything later. Enter exporting an Azure SQL DB to storage!

Click through for an example of how it works. Given that we’re getting bacpac files out, I wonder what it would look like with a really large database.

Comments closed

Downloading and Deleting Files in Oracle Object Storage

Brendan Tierney continues a series on Oracle Object Storage:

In my previous posts on using Object Storage I illustrated what you needed to do to setup your connect, explore Object Storage, create Buckets and how to add files. In this post, I’ll show you how to download files from a Bucket, and to delete Buckets.

Click through for scripts to perform both. If you just want to delete an item from a bucket without deleting the bucket as a whole, you can do so with a quick modification to Brendan’s script.

Comments closed

Bucket Operations in Oracle Object Storage

Brendan Tierney continues a series on working with Oracle Object Storage:

In a previous post, I showed what you need to do to setup your local PC/laptop to be able to connect to OCI. I also showed how to perform some simple queries on your Object Storage environment. Go check out that post before proceeding with the examples in this blog.

In this post, I’ll build upon my previous post by giving some Python functions to:

  • Check if Bucket exists
  • Create a Buckets
  • Delete a Bucket
  • Upload an individual file
  • Upload an entire directory

Read on for those examples.

Comments closed

Working with Oracle OCI Object Storage

Brendan Tierney peruses buckets:

This blog post will walk you through how to access Oracle OCI Object Storage and explore what buckets and files you have there, using Python and the OCI Python library. There will be additional posts which will walk through some of the other typical tasks you’ll need to perform with moving files into and out of OCI Object Storage.

It looks like the interface for this is substantially similar to AWS’s S3.

Comments closed