Using a query label in Azure SQL DW (Data Warehouse) can be a really handy technique to track queries via DMVs. You might want to do this to see what problematic queries are doing under the covers.
Let’s check out an example. First I will show you how things would look without using a query label. I connect to SQL DW and issue the following basic example query.
It’s an interesting approach and solves a problem I saw in Polybase around figuring out which session details were yours after the fact.
The question is how are backups done with Azure SQL DW?
It is very different from Azure SQL DB (which you would expect). Azure SQL DW has a totally different architecture to its classic database counter-part. Restore points are the key here. Automatic ones are taken throughout the day and are kept for seven days only. Worst case scenario is the time between the restore points will be eight hours hence giving an eight hour RPO (Recovery Point Objective).
You can also create manual restore points, as Arun shows.
I wrote the updated content from a practical point of view, totally hype-free. The table of contents:
- Modern Data Architecture
- Business Needs Driving Data Architectures to Evolve and Adapt
- Principles of a Modern Data Architecture
- Data Lake + Data Warehouse: Complementary Solutions
- Tips for Designing a Data Lake
- Azure Technologies for Implementing a Data Lake
- Considerations for a Successful Data Lake in the Cloud
- Getting Started with a Data Lake
To download the ebook, BlueGranite will ask for you to register your information. That’s common for premium content like this. We take a low-key approach to sales, so I can assure you that registration only means you’ll receive notifications of new content that you may find interesting.
It’s the length of a good-sized paper, so you won’t have to invest dozens of hours of time to get the story.
Access to the table columns can be controlled based on the user’s execution context or their group membership with the standard GRANT T-SQL statement. To secure your data, you simply define a security policy via the GRANT statement to your table columns. For example, if you would like to limit access to PII data in your customers table, you can simply GRANT SELECT permissions on specific columns to the ContractEmp role:GRANT SELECT ON dbo.Customers (CustomerId, FirstName, LastName) TO ContractEmp;
This capability is available now in all Azure regions with no additional charge.
This has been in regular SQL Server for a long time, so it’s good to see it make its way into Azure SQL Data Warehouse, and in a manner which doesn’t involve creating user-defined functions for predicates like Row-Level Security.
Auditing is disabled by default and the UI experience depends on the region to which the logical server is deployed. For instance, in UK South, the portal offers no options to manage auditing:
In North Europe, the portal allows Table Auditing (table-storage based) to be enabled on the SQL Data Warehouse scope, but it isn’t possible to enable Blob Auditing:
On top of that, Blob Auditing behaves differently when enabled on a logical server level in different regions. In locations that support Table Auditing, turning on Blob Auditing automatically enables it in all databases, including SQL Data Warehouses—and that’s expected. In other regions, Blob Auditing is not automatically enabled and has to be turned on programmatically by calling ARM REST API.
I imagine the plan is to support this across the board but it’s rolling out region by region.
Previously, SQL DW supported only automated snapshots guaranteeing an eight-hour recovery point objective (RPO). While this snapshot policy provided high levels of protection, customers asked for more control over restore points to enable more efficient data warehouse management capabilities leading to quicker times of recovery in the event of any workload interruptions or user errors.
Now, with user-defined restore points, in addition to the automated snapshots, you can initiate snapshots before and after significant operations on your data warehouse. With more granular restore points, you ensure that each restore point is logically consistent and limit the impact and reduce recovery time of restoring the data warehouse should this be needed. User-defined restore points can also be labeled so they are easy to identify afterwards.
Creating a user-defined restore point is a one-liner in Powershell, and it’s something you could do after each warehouse load, for example.
For the purposes of this post the TSQL shown is elementary (don’t be surprised by that), the point is really about SHUFFLE. So, I select the estimated plan for the following code.SELECT SOD.[SalesOrderID],SOD.[ProductID], SOH.[TotalDue] FROM [SalesLT].[SalesOrderDetail] SOD JOIN [SalesLT].[SalesOrderHeader] SOH ON SOH.[SalesOrderID] = SOD.[SalesOrderID] WHERE SOH.[TotalDue] > 1000
Shuffle me once, why not shuffle me twice. If you REALLY want to see the EXPLAIN command output, then it looks like this snippet below.
The DSQL operation clearly states SHUFFLE_MOVE. Why am I getting this? What does it mean?
Shuffling data isn’t the worst thing in the world, but it is a fairly expensive operation all things considered. Ideally, your warehouse architecture limits the number of shuffle operations, but considering that you can only hash on one key, sometimes it’s inevitable.
Recently we had a requirement to perform SQL Spatial functions on data that was stored in Azure SQL DW. Seems simple enough as spatial has been in SQL for many years, but unfortunately, SQL Spatial functions are not natively supported in Azure SQL DW (yet)!
If interested – this is the link to the Azure Feedback feature request to make this available in Azure SQL DW – https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10508991-support-for-spatial-data-type
AND SO — to use spatial data in Azure SQL DW we need to look at alternative methods. Luckily a recent new feature in Azure SQL DB in the form of Elastic Query to Azure SQL DW now gives us the ability to perform these SQL Spatial functions on data within Azure SQL DW via a very simple method!
Check out that Azure Feedback item if you’d like to see native spatial support rather than using elastic query. In the meantime, click through to see Rolf’s workaround.
One difference between regular tables and temporary tables is how they are typically used. Temporary tables are session scoped which means that adding them into a process or report will probably cause them to be created multiple times. Temporary tables might be very similar to regular tables but most regular tables are not re-written into, every time they are queried.
The disk writes involved in populating the temporary table might be more expensive than the reads would be if you were to modify your query to include the logic into one, larger, query. The frequency of the report or process will be a factor into how much of a performance hit you get by using the temporary tables. If you are using temporary tables to make debugging a procedure easier or to enhance readability, make sure you understand the IO cost of performing writes and then reading that data back into a subsequent query.
Read on for more.
Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.
Data movement is the process of extracting data from source systems and bringing it into the data warehouse and is commonly called ETL, which stands for extraction, transformation, and loading.
If you are building a data warehouse, should you move all the source data into the data warehouse, or should you create a virtualization layer on top of the source data and keep it where it is?
Read on for James’s thoughts.