Cloud – Page 37 – Curated SQL

Read and Write Data with PySpark

Published 2023-06-22 by Kevin Feasel

Dustin Vannnoy has two of the three R’s down:

Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake capabilities for another tutorial.

Click through to see how to read from and write to CSV, JSON, and Parquet formats. Dustin has examples of working with Azure Blob Storage, S3, and Google Cloud Storage, and even some database examples with JDBC.

Comments closed

Running SqlBulkCopy in Parallel from Powershell

Published 2023-06-22 by Kevin Feasel

Jose Manuel Jurado Diaz has a script for us:

Today, we encountered an interesting service request of attempting to reduce the load times for 100,000 records from a table with 97 varchar(320) fields in an Azure SQL HyperScale database. Following, I would like to share my lessons learned here.

The idea is to split in different concurrent process the execution of multiples SqlBulkCopy. In this case, we are going to split this process in 5 processes running in parallel inserting 20,000 rows, let’s try to know the total size.

Read on for the script, as well as a rough idea of how long it’ll take inserting into an Azure SQL DB Hyperscale instance.

Comments closed

Tools for Optimizing Azure SQL MI Performance

Published 2023-06-21 by Kevin Feasel

Rie Merritt breaks out the toolbox:

Azure SQL Managed Instance provides options within and outside Azure portal for troubleshooting and optimizing performance. Within the portal, you can leverage automatic tuning and Intelligent Insights. Outside of the Azure Portal, you can take advantage of the capabilities that are already in the database engine, such as query store and dynamic management views (DMV). In addition, Microsoft offers several monitoring options that are in preview: Azure SQL Insights inside Azure Monitor, which requires an agent on a VM you own, Azure SQL Analytics, and Azure diagnostic telemetry.

Automatic tuning in SQL Managed Instance supports FORCE LAST GOOD PLAN, which identifies queries using an execution plan that is slower than the previous good plan. It forces queries to use the last known good execution plan. Since the system automatically monitors the workload performance, in case of changing workloads, the system dynamically adjusts to force the best performing query execution plan.

Many of the things Rie describes are also available on-premises, though Azure SQL Analytics is only available in Azure SQL DB and Azure SQL MI, as of the time of this post.

Comments closed

Creating an Azure DevOps YAML Pipeline for SQL Server Deploys

Published 2023-06-20 by Kevin Feasel

Oilivier Van Steenlandt updates to the new Azure DevOps model:

In one of my previous blog posts, I used the SQL Server database deploy task to deploy my DACPAC to SQL Server. Unfortunately, this task became deprecated in Release Pipelines. In this blog post, I would like to share the alternative.

Additionally, we will be moving from a Classic Release pipeline to a YAML pipeline. The YAML pipeline will be responsible for building and deploying our Database Projects.

Click through for the walkthrough.

Comments closed

Auto-Pausing Synapse Dedicated SQL Pools

Published 2023-06-19 by Kevin Feasel

Mark Broadbent saves some money via pool auto-pausing:

This capability is neither earth shatteringly new nor unexpected, and something that Databricks has provided for some time. Of the two Data Exploration & Data Warehousing Pool types, Synapse Serverless Pool (otherwise know as the built-in Pool) by its very definition does not incur compute charges when it is not running.

Therefore this leaves us with only dedicated SQL Pool to worry about and this is where our problems begin.

Click through for the scripts to pause and resume a dedicated SQL pool, and Mark promises a part 2 in which we see the automation.

Comments closed

Computed Columns in Snowflake

Published 2023-06-14 by Kevin Feasel

Kevin Wilkie does the math:

Sometimes to make our lives easier, we, as database engineers, can create a table that automatically tells us the answer as we need it – or at least how we tell it we want it. In SQL Server, we create what is called “Computed Columns.”

Read on to see how to create one of these in Snowflake.

Comments closed

A Primer on SQL Audit

Published 2023-06-14 by Kevin Feasel

Deepthi Goguri gives us an overview of Azure SQL Database’s built-in auditing functionality:

As you all know how crucial it is to Audit activity on the Server for both prod and non-prod environments, turning on the auditing in Azure SQL is pretty simple and the results we see in the audit log are similar to the logs we see on-prem. The difference is where we save the audit data in Azure.

You can enable the auditing at the Server level and at the database level just like the way we can audit SQL Server on-prem. If you would like to enable audit at the Server level in Azure, it will automatically audit all the databases under that Server. If we allow the auditing at the server level (logical Server for Azure SQL Databases) and also at the database level, we might get double the amount of collected audit data as it contains the same data twice. Always chose the Storage account if you wanted to audit the data at the Server level. If you just want to collect the audit data on one or some databases only, you can disable the logical Server level audit and enable the Auditing at the database level.

Read on for more information and to see a bit of it in action.

Comments closed

Table Clustering and Search Optimization in Snowflake

Published 2023-06-13 by Kevin Feasel

Kedhar Natekar continues a series on Snowflake performance:

Clustering does not guarantee improved performance on non-clustered columns.

If you have frequent queries on non-clustered columns and performance is the key irrespective of cost, then opt for a search optimization service over the entire table of specific columns.

It’s similar to enabling indexing on RDBMS databases like Oracle on specific columns.

Read on to see how these work and get a few tips along the way.

Comments closed

Computed Properties in Cosmos DB

Published 2023-06-12 by Kevin Feasel

Hasan Savran shows off a new feature in Cosmos DB:

A computed property is a virtual property that is not physically stored in a document. You can use data from other properties of a document to calculate a value for the computed property. This will help you to save CPU power since the database engine does not need to calculate the computed column value for each query request. In the Cloud CPU power means money! Computed Columns are like coupons you can use to save money.

Read on to learn more about how computed properties work, their limitations, and some examples of how to create and use them.

Comments closed

The Basics of Azure Chaos Studio

Published 2023-06-09 by Kevin Feasel

Arun Sirpal gives us an overview:

Chaos engineering is fun but especially important when building solutions in the cloud. It is great leveraging the cloud to build something, whether that’s a globally distributed website with lots of traffic or an internal 3 tier application for a business – the question is – what happens is there is an unexpected fault / disruption? Can your system / app withstand the issue?

Click through for the overview, as well as some additional resources you can use to try it out.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud