Storage – Page 12 – Curated SQL

Creating a Gen-2 Azure Data Lake Store

Published 2019-12-10 by Kevin Feasel

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Comments closed

Testing SQL Server Storage Performance

Published 2019-12-02 by Kevin Feasel

Brent Ozar walks us through CrystalDiskMark 7 to check whether storage speed is up to snuff:

The Peak Performance + Mix setting runs a pretty cool mix of tests that will push your storage hard. Note that I don’t try to get CDM to replicate exactly how SQL Server does IO: I’m just trying to get a quick 5-minute idea of whether my storage is hot or not.

Click through for the demo. Mind you, this is something you want to do before setting up SQL Server…

Comments closed

SQL Server on Azure: Performance Optimized Storage Config

Published 2019-10-04 by Kevin Feasel

Mine Tokus announces a new feature when using Azure to host IaaS SQL Server instances:

Today, we are excited to announce Performance Optimized Storage Configuration capabilities for the VM’s registered with SQL VM RP. This feature automates storage configuration according to performance best practices for SQL Server on Azure virtual machines through Azure Portal or Azure Quick start Templates when creating a SQL VM. Automated performance best practices include separating Data and Log files, cache configuration for premium disks hosting data and log files, support for Temp DB on local disk, support for Ultra disks to host data, log or Temp DB files and database engine only images. In this article, we will discuss each automated performance best practice in detail.

Read on for the description and check out those links for additional information.

Comments closed

ADLS Gen2 Navigation in Power Query

Published 2019-10-01 by Kevin Feasel

Chris Webb shows off hierarchical navigation in Power Query against Azure Data Lake Storage Gen2:

While the documentation on how to import data from Azure Data Lake Gen2 Storage into Power BI is pretty detailed, the connector (which at the time of writing is in beta) that supports this functionality in the Power Query engine has some useful functionality that isn’t so obvious. If you look at the built-in documentation on the AzureStorage.DataLake M function in the Power Query Editor you’ll see there are a lot of options that aren’t in the documentation on the web yet:

Click through for an example.

Comments closed

HBase and S3

Published 2019-09-27 by Kevin Feasel

Krishna Maheshwari, et al, explain how we can allow Apache HBase to use S3 for storage:

Cloudera Data Platform (CDP) provides an out-of-the-box solution that allows Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data. Amazon S3 is an object store which offers a high degree of durability with a pay-per-use cost structure. There is no server-side component to run or manage for S3 — all that is needed is the S3 client library and AWS credentials. However, HBase requires a consistent and atomic filesystem which means that it cannot directly use S3 because it is an eventually consistent object store. Both CDH and HDP have only provided HBase solely using HDFS because there have been long-standing impediments that prevented HBase from natively using S3. To address these issues, we’ve built an out-of-the-box solution which we are delivering for the first time via CDP. When you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.

I hadn’t thought of using S3, but it’s an interesting post.

Comments closed

VM Storage Performance in the Cloud

Published 2019-09-19 by Kevin Feasel

Joey D’Antoni explains how storage architecture has changed from on-prem to the cloud:

This architecture design dates back to when a storage LUN was literally a built of a few disks, and we wanted to ensure that there were enough I/O operations per second to service the needs of the SQL Server, because we only had the available IO of a few disks.
As virtualization became popular storage architectures changes and the a SAN lun was carved out into many small extents (typically 512k-1MB depending on vendor) across the entire array. What this meant was that with modern storage there was no need to separate logs and data files, however some DBAs did, however in an on-premises world there was no penalty for this.

It’s important to keep up on these changes.

Comments closed

Uploading to Blob Storage Archive Tier

Published 2019-09-11 by Kevin Feasel

Bub Pusateri has a helpful script for us:

Last year I wrote about how to upload data to Azure Blob Storage Archive Tier, and included a PowerShell script to do so. It’s something I use regularly, as I have hundreds of gigabytes of photos and videos safely (and cheaply!) stored in Azure Blob Storage using Archive Tier.

But read on for a recent announcement to make the process easier.

Comments closed

Using Azure Storage Explorer

Published 2019-09-06 by Kevin Feasel

Arun Sirpal takes us through Azure Storage Explorer:

I only ever use the storage explorer when managing my blobs, files, queues within storage accounts. It is your single view access point for all your storage needs and I totally recommend downloading it and using it (https://azure.microsoft.com/en-gb/features/storage-explorer/).

Why do I like using it? I am sure there are more reasons, but these are personal to me.

Click through for Arun’s reasons as well as installation basics.

Comments closed

Saving Data in Docker Containers

Published 2019-09-04 by Kevin Feasel

Anthony Nocentino has a three-part series on persisting SQL Server data in Docker containers. Part 1 takes us through volumes:

Let’s talk about how we can use Docker Volumes and SQL Server to persist data. If we want to run SQL Server in a container we will want to decouple our data from the container itself. Doing so will enable us to delete the container, replace it and start up a new one pointing at our existing data. When running SQL Server in a container will store data in /var/opt/mssql by default. When the container starts up for the first time it will put the system databases in that location and any user databases created will also be placed at this location by default.

Part 2 looks at how volumes differ between the Linux and Mac/Windows versions of Docker:

So in my previous post, we discussed Docker Volumes and how they have a lifecycle independent of the container enabling us to service the container image independent of the data inside the container. Now let’s dig into Volumes a little bit more and learn where Docker actually stores that data on the underlying operating system.

Part 3 ties it in with SQL Server:

Makes sense…we changed where SQL Server is reading/writing data. macOS doesn’t support a file mode called O_DIRECT which allows for unbuffered read/write access to the file opened using the open system call. O_DIRECT is used by systems that manage their own file caching, like relational database management systems (RDBMS). So as SQL starts up and tries to open the master database with O_DIRECT the files can’t be opened because the macOS kernel doesn’t support this mode. And this is the reason why we have to have that Linux VM around. That Linux VM will support O_DIRECT option on the file opened. See more about this at the GitHub issue here.

Definitely worth getting a handle on this if you’re interested in containers.

Comments closed

The Databricks File System

Published 2019-08-14 by Kevin Feasel

Brad Llewellyn takes us through the Azure Databricks File System:

Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks. If you haven’t read the previous posts in this series, Introduction, Cluster Creation and Notebooks, they may provide some useful context. You can find the files from this post in our GitHub Repository. Let’s move on to the core of this post, DBFS.
As we mentioned in the previous post, there are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data. For this post, we’re going to talk about the storage layer underneath Azure Databricks, DBFS. Since Azure Databricks manages Spark clusters, it requires an underlying Hadoop Distributed File System (HDFS). This is exactly what DBFS is. Basically, HDFS is the low cost, fault-tolerant, distributed file system that makes the entire Hadoop ecosystem work. We may dig deeper into HDFS in a later post. For now, you can read more about HDFS here and here.

Click through for more detail on DBFS.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Storage