Press "Enter" to skip to content

Category: Synapse Analytics

Principles of Synapse Security

Liliam Leme provides an overview of security options in Azure Synapse Analytics:

This blog post will provide an overview of the Synapse security environment focused on Dedicated SQL Pool, Serverless SQL Pool, and Spark.

Security has many layers and frequently it will determine how you build your process. I start this post by reviewing several important security considerations which you can later apply to your Synapse environment. 

This is a fairly lengthy post and it still only covers a moderate amount of what you’d want to do for Azure Synapse Analytics. This is the downside to having a complex interplay of several products: there’s a lot to secure and a lot to think about along the way.

Comments closed

Getting Started with Azure Synapse Analytics

Shabnam Watson gets us started with Azure Synapse Analytics:

In this blog post, I show you how easy it is to start an Azure Synapse Analytics workspace (instance) and use its Serverless SQL Pool engine to analyze sample publicly available data. As you will read shortly, Azure Synapse Analytics provides many compute engines for different use cases. The easiest one to get started with is its Serverless SQL Pool since every Azure Synapse Analytics instance comes with one already created and ready to use. It also does not have any cost unless if you use it which makes it very attractive to those who have a limited Azure budget.

Click through to see how to create a workspace, load some data, and query it via the serverless SQL pool.

Comments closed

Recommendations for Dedicated SQL Pool Data Modeling

Bhaskar Sharma has some advice:

In this article, I will discuss how to physically model an Azure Synapse Analytics data warehouse while migrating from an existing on-premises MPP (Massive Parallel Processing) data warehouse solution like Teradata and Netezza. The approach and methodologies discussed in this article are purely based on the knowledge and insight I have gained while migrating these data warehouses to Azure Synapse dedicated SQL pool. 

Dedicated SQL pools are close enough to regular SQL Server that we make a lot of assumptions about it, some of which may be wrong.

Comments closed

Deploying a dacpac to the Serverless SQL Pool

Kevin Chant drops some dacpacs off at the (serverless) pool:

In this post I want to cover deploying a dacpac to a serverless SQL pool using Azure DevOps. Yes, you are reading that right. It is now possible thanks to a sqlpackage update.

To clarify, a dacpac file is a special file that you can use to deploy updates to SQL Server related databases using a state-based deployment. Plus, when I say serverless SQL pool I mean an Azure Synapse Analytics serverless SQL Pool.

Kevin includes examples for Azure DevOps as well as GitHub Actions.

Comments closed

Trying out Azure Synapse Link for SQL Server 2022

Kevin Chant looks at Azure Synapse Link for SQL Server 2022:

My first topic is about a new feature that covers both SQL Server 2022 and Azure. Which is Azure Synapse Link, or to be more precise Azure Synapse Link for SQL Server 2022.

I have been doing various tests with this feature recently. Which has led to some interesting blog posts about Azure Synapse Link for SQL Server 2022.

Read on for a few more thoughts, as well as deployment scripts via Azure DevOps and GitHub Actions.

Comments closed

Automated Delta Lake Maintenance in Synapse

Shalu Ganotra Chadha, et al, explain how to keep your Delta Lake tidy:

The useful features of Delta Lake come at the cost of requiring regular maintenance. Delta Lake requires periodic cleanup as it accumulates files over time with each upsert and retains previous snapshots of the data. They can quickly convert a small dataset (in MBs) to several GBs of storage. This is because deleted data is not really removed but retained as an older snapshot of the Delta Lake dataset.

Click through for two operations you can perform on a Delta Lake, as well as some recommendations on when to do what via the Genie Delta Lake Auto Maintenance scripts they provide.

Comments closed

Disabling Public Network Access in Synapse

Ryan Adams builds a private endpoint:

If you disable public access to your Azure Synapse Workspace you will get the following error message when attempting to open Synapse Studio. 

“Failed to load one or more resources due to forbidden issue, error code 403.” 

Click through for more information about routing for Synapse resources and what you’d need to do in order to disable public network access entirely.

Comments closed

Power BI Incremental Refresh with Non-Standard Dates in Parquet Files

Shabnam Watson hits on a specific but interesting use case:

The most common scenario for setting up the out of the box incremental refresh in Power BI is to base it off of a datetime column; however, there are cases when you may want to set up incremental refresh based off of a column with a data type other than datetime. Examples are when you are working with a smart date ID (01012023 for Jan 1,2023) column or when you are working with a source system that has partitioned data using a column such as Year that has a numeric data type.

A use case for the latter scenario is when you are working with Parquet/Delta files via Azure Synapse Analytics Serverless SQL Pool. When working with larger datasets, it is typical to see the Parquet/Delta files partitioned by date ranges. Depending on how much data there is, the partitioning may be at the Year level instead of Day.

With that scenario in mind, read on to learn how you can minimize your Power BI processing time and costs when doing incremental refresh.

Comments closed

Azure Synapse Analytics January 2023 Updates

Ryan Majidimehr shares an overview of what the Synapse team has been working on:

ADX contains native support for detecting anomalies over multiple time series by using the function series_decompose_anomalies(). This function can analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows based on ADX. Univariate analysis is simpler, faster, and easily scalable and is applicable to most real-life scenarios. However, there are some cases where it might miss anomalies that can only be detected by analyzing multiple metrics at the same time.

For some scenarios, there might be a need for a true multivariate model, jointly analyzing multiple metrics. This can be achieved now in ADX using the new Python-based multivariate anomaly detection functions.

The themes for this month are Spark, Data Exploration (via Kusto), and data integration.

Comments closed