Synapse Analytics – Page 19

So you’ve started using Synapse SQL Pools, you’ve scaled up to improve performance, but your queries aren’t going as fast as you’d hoped. You take a look at resource utilisation and you see that as you scale up, your total resource utilisation per query goes down. But didn’t we scale up to allow our query to use more resource and run quicker? What’s going on here? A little thing called Workload Management.
What is Workload Management? At a very high level, Workload Management allows you to control how your workload uses system resources. The main problem here is defaults!

Click through for the full story. Or at least the full part 1 of the even fuller story.

Comments closed

Creating a dacpac for a Dedicated SQL Pool

Published 2021-06-10 by Kevin Feasel

Kevin Chant shows how to use Azure DevOps to create a dacpac for an Azure Synapse Analytics dedicated SQL pool:

By the end of this post, you will know how to create a dacpac for a dedicated SQL Pool within Azure Pipelines for your CI/CD deployments. Plus, how you can synchronize a database project created in Azure Data Studio with a Git repository in Azure DevOps.
In a previous post I covered how you can create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio. In that post I stated that you could create a dacpac for the database project using Azure DevOps.
With this in mind, I will use the same database project that I created in that post.

Click through for the process.

Comments closed

Improving Parquet External Table Performance in Dedicated SQL Pools

Published 2021-06-04 by Kevin Feasel

Jovan Popovic shows us two ways of accessing data in Parquet files in Azure Synapse Analytics dedicated SQL pools:

Azure Synapse Analytics enables you to read Parquet files stored in the Azure Data Lake storage using the T-SQL language and high-performance Parquet readers. The key characteristic of these high-performance Parquet readers is that they are using the native (C++) code for reading Parquet files, unlike the existing Polybase Parquet reader technology that uses the Java code. These native readers are introduced in the serverless SQL pools in Azure Synapse Analytics workspaces.
In many experiments, this native technology that is used in the serverless SQL pools demonstrated better performance compared to the existing Polybase external table in the dedicated SQL pools.
This native technology for reading Parquet files is now also available in the dedicated SQL pools. In the dedicated Pools in Azure Synapse Analytics, you can create external tables that use native code to read Parquet files and improve performance of your queries that access external Parquet files.

Click through for the process, as well as what kind of performance differences you can see. Some of the queries ended up being worse for native tables versus PolyBase tables, but the majority were a good bit better.

Comments closed

Accessing ADLS Gen2 Data in Serverless SQL Pools with SAS Tokens

Published 2021-06-03 by Kevin Feasel

Neel Ball shows how you can use various techniques, including SAS tokens, to access data stored in Azure Data Lake Storage Gen2 from Azure Synapse Analytics serverless SQL pools:

You have a data lake that contains employee and social feed data. You have data residing in an employee folder that is used by HR team members and twitter for live social feeds that is usually used by marketing folks. If you use SAS token or RBAC, you cannot control to the folder level.
How do you allow users to perform data exploration using synapse serverless with fine grain control on underlying storage.

Read on for one solution.

Comments closed

Creating a dacpac for a Dedicated SQL Pool

Published 2021-06-03 by Kevin Feasel

Kevin Chant shows us how to build out a dacpac file for an Azure Synapse Analytics dedicated SQL Pool:

In reality, you can create a dacpac for a database that’s inside an Azure Synapse Analytics dedicated SQL Pool using a lot of the methods that you use to create them for SQL Server databases.
Azure Data Studio can be an appealing alternative SQL Server Data Tools (SSDT) for tasks like this. Due to various reasons. For instance, it’s a multi-platform solution that is easy to install.
With this in mind, I decided in this post to cover how to create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio.

Click through to see how.

Comments closed

Moving Synapse Databases Across Subscriptions

Published 2021-06-02 by Kevin Feasel

Steve Hughes hits on one of the tricky administrative bits of Azure Synapse Analytics:

So you can copy Azure SQL Database using the Azure Portal, PowerShell, Azure CLI, and T-SQL. However, this functionality is limited to Azure SQL Database and does not work for Azure Synapse databases (a.k.a. SQL Pools). Early in 2021, the ability to use the copy functionality to copy databases between subscriptions is also supported but requires security work to make sure the permissions in the database servers and networking allow that to happen.

There’s a lot involved in the process, leaving me to provide the sage wisdom that it’s easier not to put it in the wrong subscription to begin with if you can avoid it.

Comments closed

Querying Delta Lake Files with T-SQL in Azure Synapse Analytics

Published 2021-05-28 by Kevin Feasel

Jovan Popovic shows off a new feature in public preview for Azure Synapse Analytics serverless SQL pools:

You can use Azure Synapse and Azure Databricks to prepare and modify your Delta Lake data sets placed in the Azure Data Lake storage. Once your data engineers have prepared the data, your data analysts can create reports using the tools such as Power BI.
Using the serverless query endpoint in Azure Synapse, you can create a relational layer on top of your Delta Lake files that directly references the location where Azure Synapse and Azure Databricks are used to modify data. This way, you can get the real-time analytics on top of the Delta Lake data set without any need to wait for a pipeline to copy and prepare data.

Read on to see how this works.

Comments closed

Azure Synapse Analytics Supports Apache Spark 3.0

Published 2021-05-26 by Kevin Feasel

Euan Garden has some great news for us:

Starting today, the Apache Spark 3.0 runtime is now available in Azure Synapse. This version builds on top of existing open source and Microsoft specific enhancements to include additional unique improvements listed below. The combination of these enhancements results in a significantly faster processing capability than the open-source Spark 3.0.2 and 2.4.
The public preview announced today starts with the foundation based on the open-source Apache Spark 3.0 branch with subsequent updates leading up to a Generally Available version derived from the latest 3.1 branch.

It still won’t be as fast as Databricks, but it should be a good bit faster than the Spark 2 they were running.

Comments closed

A Logical Architecture for Azure Synapse Analytics

Published 2021-05-06 by Kevin Feasel

Paul Andrew lays out some thoughts:

As a community let’s all start thinking about this given the rich unified environment that Synapse Analytics offers. Below is a picture I’ve drawn as a starting point for a complete Synapse logical architecture. What do you think?

Click through for the image, which looks pretty solid to me.

Comments closed

Querying Serverless SQL Pools from Spark Notebooks in Scala

Published 2021-04-19 by Kevin Feasel

Jovan Popovic shows off one integration point between the data services in Azure Synapse Analytics:

Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data.
You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.
In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool.

Read on to see how. You can also query Spark pool and dedicated SQL pool tables from serverless SQL pools.

4 Comments

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Synapse Analytics

Workload Management in Azure Synapse Analytics

Creating a dacpac for a Dedicated SQL Pool

Improving Parquet External Table Performance in Dedicated SQL Pools

Accessing ADLS Gen2 Data in Serverless SQL Pools with SAS Tokens

Creating a dacpac for a Dedicated SQL Pool

Moving Synapse Databases Across Subscriptions

Querying Delta Lake Files with T-SQL in Azure Synapse Analytics

Azure Synapse Analytics Supports Apache Spark 3.0

A Logical Architecture for Azure Synapse Analytics

Querying Serverless SQL Pools from Spark Notebooks in Scala