Synapse Analytics – Page 12

This Month in Synapse

Published 2022-04-28 by Kevin Feasel

Ryan Majidimehr recaps the updates for Azure Synapse Analytics in April of 2022:

Since the GA of Synapse, customers have asked for a fine-grained RBAC (role-based access control) role that allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications.
Now, customers can assign the Synapse Monitoring Operator role to such monitoring personas. This allows organizations to stay compliant while having flexibility in the delegation of tasks to individuals or teams.

That is indeed useful.

Comments closed

Synapse Database Templates GA

Published 2022-04-26 by Kevin Feasel

Kevin Schofield makes an announcement:

We’re pleased to announce today that Synapse Database Templates are now Generally Available and that we are also making available three additional Synapse Database Templates for Healthcare Insurance, Healthcare Providers, and R&D and Clinical Trials.
The Healthcare Insurance template is a comprehensive data model that addresses the typical data requirements of organizations providing insurance to cover healthcare needs (sometimes known as Payors).
The Healthcare Providers template is a comprehensive data model that addresses the typical data requirements of organizations providing healthcare services.
The R&D and Clinical Trials template is a comprehensive data model that addresses the typical data requirements of organizations involved in research and development and clinical trials of pharmaceutical products and devices.

Read on to learn more about how these templates work and what you can do with them.

Comments closed

Cross-Subscription Restore for Dedicated SQL Pools

Published 2022-04-20 by Kevin Feasel

Steve Howard announces some good news:

We are excited to announce the release of cross-subscription restore. This has been one of our top requested features from customers as it unlocks multiple scenarios from dev/test to simplified billing at the subscription level for restored data warehouses.

Click through to see how you can do this. There was a workaround in the past but this should be quite a bit faster.

Comments closed

Streaming Data into Synapse Dedicated SQL Pool

Published 2022-04-15 by Kevin Feasel

Lionel Penuchot loads some data:

This article reviews a common pattern of streaming data (i.e. real-time message ingestion) in Synapse dedicated pool. It opens a discussion on the simple standard way to implement this, as well as the challenges and drawbacks. It then presents an alternate solution which enables optimal performance and greatly reduces maintenance tasks when using clustered column store indexes. This is aimed at developers, DBAs, architects, and anyone who works with streams of data that are captured in real-time.

I’d probably avoid the MERGE statement in there because of how many problems there are with it. That said, this is a useful pattern for trickle-loading columnstore tables.

Comments closed

A Workaround to Parameterize Synapse Execute Pipeline Activities

Published 2022-04-14 by Kevin Feasel

Nick Edwards hits us with bad news:

Unfortunately, as of April 2022 there is not an option to parameterise or add dynamic content to an “Execute Pipeline” activity to invoke a pipeline run.
However, with the use of a Microsoft API there is method which we can use to overcome this.

Click through to learn more about this API call and see an example of it in action.

Comments closed

Data Ingestion and Exploration in Azure Synapse Analytics

Published 2022-04-13 by Kevin Feasel

Cathrine Wilhelmsen gets some Lego data:

In the April session of the Azure Synapse Analytics and Microsoft MVP series, I got to show one of my favorite demos: ingesting and exploring LEGO datasets! 🤓 In just a few minutes, I grab a compressed file from the web, decompress it, store it as a Parquet file in my data lake, and run SQL queries on it using Serverless SQL. It really is that simple:

Click through for a video and a step-by-step walkthrough.

Comments closed

From Confluent Cloud into Azure Synapse Analytics

Published 2022-04-12 by Kevin Feasel

Jacob Bogie and Dustin Vannoy show how to integrate Kafka in Confluent Cloud with pools in Azure Synapse Analytics:

Just released this fall, is the fully managed Synapse Connector. Azure Synapse Analytics provides a platform for data analysts and data scientists to analyze and combine data from multiple sources. Within Confluent Cloud, data can be synched to dedicated SQL pools via the fully managed Synapse sink connector and attached to Synapse Analytics workspace. Once added to the Synapse Analytics workspace, analysts have the ability to perform advanced analytics and reporting on data in the Confluent pipeline. The ability to access event-level data enables event-level analytics and data exploration.

Click through for two examples, one of loading data into a dedicated SQL pool and one of streaming data into Spark Streaming running on (naturally) a Spark pool.

Comments closed

Using the Azure Synapse Analyzer Report

Published 2022-04-12 by Kevin Feasel

Sanjay Raut introduces an interesting report:

The Azure Synapse Analyzer Report was created to help you identify common issues that may be present in your database that can lead to performance issues. This report focuses on known best practices that Microsoft has identified with SQL Dedicated Pools. Following these best practices will help to get the best performance out of your solution.

One thing I appreciate about this is that it covers many items which people don’t know to think about when moving over from SQL Server or Azure SQL Database.

Comments closed

Intelligent Cache for Spark in Synapse

Published 2022-03-29 by Kevin Feasel

Avinanda Chattapadday makes an announcement:

Traditionally, when querying a file or table from your data lake, the Apache Spark engine in Synapse makes a call to your remote ADLS Gen2 storage for each read of the data. For workloads with frequent repeat queries, this process can be redundant and add latency to the overall processing time. Although Apache Spark provides a great caching feature, it must be manually set and released to minimize the latency and improve overall performance. It can also result in queries of stale data if the underlying data changes. This is where the intelligent cache in Azure Synapse can simplify the process; by automatically detecting changes to the underlying files and automatically refreshing them in the cache, you ensure you have access to the most recent data. When the cache reaches its size limit, it will automatically release the least-read data to make space for more recent data.

Click through to see how you can enable this, as well as a few more details on the process.

Comments closed

Thoughts on CI/CD in the Serverless SQL Pool

Published 2022-03-28 by Kevin Feasel

Kevin Chant answers a question:

I got asked if I thought we were going backwards with CI/CD for serverless SQL Pools. Mostly due to the fact that we had to include defensive logic in the SQL scripts.
My answer was that we did this purely because of the limitations of the SchemaVersions table. Because the DBOps PowerShell module usually creates this table in the target location if it does not exist. Which it then uses it to log which scripts have already run.
However, you cannot easily create and update this table in a serverless SQL Pool.

Read the whole thing. This sounds like a case in which having a central schema versioning location rather than using the destination system could be advantageous. Of course, now you have a dependency on that central schema versioning location, so life is full of trade-offs.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Synapse Analytics