Press "Enter" to skip to content

Category: Synapse Analytics

Azure Synapse Analytics January 2023 Updates

Ryan Majidimehr shares an overview of what the Synapse team has been working on:

ADX contains native support for detecting anomalies over multiple time series by using the function series_decompose_anomalies(). This function can analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows based on ADX. Univariate analysis is simpler, faster, and easily scalable and is applicable to most real-life scenarios. However, there are some cases where it might miss anomalies that can only be detected by analyzing multiple metrics at the same time.

For some scenarios, there might be a need for a true multivariate model, jointly analyzing multiple metrics. This can be achieved now in ADX using the new Python-based multivariate anomaly detection functions.

The themes for this month are Spark, Data Exploration (via Kusto), and data integration.

Comments closed

External Tables and the Serverless SQL Pool

Ryan Adams continues a series on querying the serverless SQL pool in Azure Synapse Analytics:

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the second method which uses an external table to query a path within the lake.

Synapse is a collection of tools with four different analytical engines (Dedicated PoolSpark PoolServerless PoolData Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  Here you will use the Synapse Serverless Pool to query the data in your ADLS account.   

Read on for a demonstration.

Comments closed

CI/CD for the Synapse Serverless SQL Pool

Kevin Chant has some links for us:

Last week the January 2023 video of the Azure Synapse Analytics and Microsoft MVP series was released. Where I covered how to do CI/CD for dedicated SQL Pools in Azure Synapse Analytics.

Since the release of that video the most popular question raised has been how to perform CI/CD for serverless SQL Pools within Azure Synapse Analytics?

Read on for links galore.

Comments closed

Reading the Data Lake with the Serverless Pool via OPENROWSET

Ryan Adams begins a series on reading data from the data lake:

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the first method which uses OPENROWSET to query a path within the lake. 

Synapse is a collection of tools with four different analytical engines (Dedicated PoolSpark PoolServerless PoolData Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  The article will focus on how you can use the Synapse Serverless Pool to query the data in your ADLS account.   

Click through for a primer on the topic, as well as a demo video.

Comments closed

CETAS in the Serverless Pool and Blob Storage Variants

Dennes Torres gives us a warning:

While making some CETAS tests, I discovered an interesting new behaviour. The following error message was displayed and it was very strange:

Msg 16539, Level 16, State 1, Line 1
Operation failed since the external data source ‘https://euwe01devqigsa01.blob.core.windows.net/dennes/filescsv/’ has underlying storage account that is not in the list of Outbound Firewall Rules on the server. Please add this storage account to the list of Outbound Firewall Rules on your server and retry the operation.

Read on for the cause of this error message.

Comments closed

Spark Structured Streaming with Synapse

Ryan Adams builds a demo:

In this post we are going to look at an example of streaming IoT temperature data in Synapse Spark.  I have an IoT device that will stream temperature data from two sensors to IoT hub. We’ll use Synapse Spark to process the data, and finally write the output to persistent storage. Here is what our architecture will look like: 

Click through for the architectural diagram and step-by-step on how to put the demo together.

Comments closed

Creating a DNS Alias for a Dedicated SQL Pool

Resham Popli creates an alias:

This blog will walk through the details on how to enable custom DNS (Domain Name System) entries on an Azure Synapse dedicated pool inside an Azure Synapse workspace in case of disaster recovery.

The DNS alias provides a translation layer that can redirect your client programs to different servers. This layer spares you the difficulties of having to find and edit all the clients and their connection strings (in disaster recovery implementation). This is not supported out-of-box, so we need to take extra steps to enable this feature. There are some limitations, so please read these steps carefully.

Read on to learn how and what those limitations are.

Comments closed

Executing Multiple Notebooks in one Spark Pool with Genie

Shalu Ganotra Chadra, et al, explain what Synapse Genie is:

The Genie framework is a metadata driven utility written in Python. It is implemented using threading (ThreadPoolExecutor module) and directed acyclic graph (Networkx library). It consists of a wrapper notebook, that reads metadata of notebooks and executes them within a single Spark session. Each notebook is invoked on a thread with MSSparkutils.run() command based on the available resources in the Spark pool. The dependencies between notebooks are understood and tracked through a directed acyclic graph.

Read on for more information about how you can use it and what the setup process looks like.

Comments closed

Cosmos DB to Data Explorer Synapse Link

Vicent-Philippe Lauzon makes an announcement:

We recently made our new Kusto data connection available in public preview:  Cosmos DB to Azure Data Explorer Synapse Link.

This does look like a marketing-heavy announcement but the short version is that you can ingest data from Cosmos DB into Data Explorer pools via Synapse Link rather than creating your own ETL process. The previous Cosmos DB connector for Synapse Link tied to a dedicated SQL pool.

Comments closed