With the CTP2 build of SQL Server 2017, you now have the ability of doing a SELECT INTO operation into a specific filegroup.
The syntax looks like thisSELECT * INTO TableName ON FileGroup FROM SomeQuery
What is the use of this you might ask? Well maybe you have some ad-hoc queries where you save some data but you don’t want it sitting on your expensive SAN. Or maybe you populate staging tables on the fly and you want it to end up on a specific SSD because you need the speed of the SSD disk for these operations.
This might also be helpful in migrating tables to different storage.
Recently a client found this article on “Best Practices for SQL Server in Azure Virtual Machines” and wanted to re-provision his volumes to adhere to them.
No my first thoughts was wait, I’m a DBA, not a System Admin that’s not my role! But thinking more about it I realized the client views this as a SQL Server issue and I am the SQL Server Consultant and that it is my job to remedy this problem.
Not being 100% confident in Azure, I spun up a VM SQL Server and attempted to add some volumes. To my surprise, this was way too easy.
Click through for the steps.
All communication with the Azure Storage via connection strings and BLOB URLs enforce the use of HTTPS, which provides Encryption in Transit. You can enforce the use of “Always HTTPS” by setting the connection string like this: “DefaultEndpointsProtocol=https;AccountName=myblob1…” or in SAS signatures, as in the example below:
To protect data at rest, the service provides an option to encrypt the data as they are stored in the account. There’s no additional cost associated with encrypting the data at rest and it’s a good idea to switch it on as soon as the account is created. There is a one-click setting at the Storage Account level to enable it, and the encryption is applied on both new and existing storage accounts. The data is encrypted with AES 256 cipher and it’s now generally available to all Azure regions and Azure clouds (public, government etc)
There’s some good information here, making it worth the read.
I made a mistake with a script today. I created three new tempdb files sized at 10GB each that filled up a hard drive.
Luckily it was in one of my own testing VMs, so it wasn’t awful. Fixing it, however, was a fun one.
**NOTE: All work was done in a test environment. Proceed with caution if you’re running these commands in Production and make sure you understand the ramifications.
It’s a good opportunity to learn from Erin’s experience.
This is working on HDInsight v3.5 w/Spark 2.0 and Azure Data Lake Storage as the underlying storage system. What is nice about this is that my cluster only has access to its cluster section of the folder structure. I have the structure root/clusters/dasciencecluster. This particular cluster starts at dasciencecluster, while other clusters may start somewhere else. Therefor my data is saved to root/clusters/dasciencecluster/data/open_data/RF_Model.txt
It’s pretty easy to do, and the Scala code would look suspiciously similar. The Java version of the code would be seven pages long.
Implicit conversions often happen when a query is comparing two or more columns with different data types. In the below example, the system is having to perform extra I/O in order to compare a varchar(max) column to an nvarchar(4000) column, which leads to an implicit conversion, and ultimately a scan instead of a seek. By fixing the tables to have matching data types, or simply converting this value before evaluation, you can greatly reduce I/O and improve cardinality (the estimated rows the optimizer should expect).
There’s some good advice here if your main hardware constraint is being I/O bound.
The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.
In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.
This is the type of thing that you can easily forget about, but it makes a huge difference down the line.
From the above screenshots, you can clearly see that the disks are not aligned.
So, what’s a big deal about this? When disks for primary and secondary are not aligned, then the AG synchronization process can run slow. This is not something which you would like to see in a Production server.
Read the whole thing.
What we need to do is to offset the beginning of the data being stored on disk to a location more conducive to how the program is operating. This offset is known as the “Partition Alignment Offset”. To be in tune with SQL Server, this value should be an increment of 64KB. However, you also need to consider the entire storage subsystem – the disks, controllers and memory. Starting with Windows Server 2008, this offset is at 1024KB – a nice increment of 64KB that also works very nicely with most RAID disks/controllers. Prior to Windows Server 2008, partition alignment offset was not explicitly performed, so this will need to be performed.
If you’ve migrated disk from server to server to server over the years, this is worth checking out.
The access tiers available for blob storage accounts are “hot” and “cold”. In general, hot data is classified as data that is accessed very frequently and needs to be highly durable and available. On the other hand, cool data is data that is infrequently accessed and long-lived. Cool data can tolerate a slightly lower availability, but still requires high durability and similar time to access and throughput characteristics as hot data. For cool data, slightly lower availability SLA and higher access costs are acceptable tradeoffs for much lower storage costs. Azure Blob storage now addresses this need for differentiated storage tiers for data with different access patterns and pricing model. So you can now choose between Cool and Hot access tiers to store your less frequently accessed cool data at a lower storage cost, and store more frequently accessed hot data at a lower access cost. The Access Tier attribute of hot or cold is set at an account level and applies to all objects in that account. So if you want to have both a hot access tier and a cold access tier, you will need two accounts. If there is a change in the usage pattern of your data, you can also switch between these access tiers at any time.
It looks like there shouldn’t be a performance difference between the two; it’s more of a cost difference in which you might be able to save money by choosing your tier wisely.