Press "Enter" to skip to content

Category: Synapse Analytics

Executing SQL Statements in Azure Data Factory

Abhishek Narain announces a pretty nice improvement to Azure Data Factory and Synapse Pipelines:

We are introducing a Script activity in pipelines that provide the ability to execute single or multiple SQL statements.  

Using the script activity, you can execute common operations with Data Manipulation Language (DML), and Data Definition Language (DDL). DML statements like SELECT, UPDATE, and INSERT let users retrieve, store, modify, delete, insert and update data in the database. DDL statements like CREATE, ALTER, and DROP allow a database manager to create, modify, and remove database objects such as tables, indexes, and users.

Be sure to read the limitations at the bottom, however.

Comments closed

Delta Lake Operability in Azure Synapse Analytics

James Serra lets us know when and where we can use Delta Lake within Azure Synapse Analytics:

Many companies are seeing the value in collecting data to help them make better business decisions. When building a solution in Azure to collect the data, nearly everyone is using a data lake. A majority of those are also using delta lake, which is basically a software layer over a data lake that gives additional features. I have yet to see anyone using competing technologies to delta lake in Azure, such as Apache Hudi or Apache Iceberg (see A Thorough Comparison of Delta Lake, Iceberg and Hudi and Open Source Data Lake Table Formats: Evaluating Current Interest and Rate of Adoption).

Read on for more information.

Comments closed

Scoring Azure ML Models in Azure Synapse Analytics

Alex Aleksandrov shows off the PREDICT operator:

We can use Synapse for many activities. We can use it not only for ingesting, querying, storing and visualising data, but for developing machine learning models as well. Of course, one can say that doing data science is another functionality of this platform and this is definitely true. However, in this article, I would like to show you that instead of using Python, one can use T-SQL for doing predictions.

Click through to see how.

Comments closed

Creating an SSIS Integration Runtime in Synapse

Andy Leonard shows one way to create an Azure * SQL Server Integration Services integration runtime for Azure Synapse Analytics:

On 17 Feb 2022, I first saw the Microsoft announcement of the public preview of Azure-SSIS integration runtimes in Azure Synapse Analytics. I blogged about the announcement in a post titled Azure-SSIS Integration Runtime now available in Azure Synapse Analytics.

I am excited to share one way for you to provision an Azure-SSIS IR in Synapse Analytics, following these steps. To start provisioning a shiny new Azure Synapse Analytics Azure-SSIS integration runtime, open Synapse Studio:

Read on for the step-by-step guide.

Comments closed

Deploying an Azure Synapse Analytics Workspace

Rajendra Gupta builds out an Azure Synapse Analytics workspace:

In the article, An Overview of the Azure Synapse Analytics, we explored the Azure Synapse workspace and its features as an analytics service combining Big data analytics and enterprise data warehousing.

This article is a practical demonstration of deploying Azure Synapse Analytics workspace using the Azure portal.

Click through for step-by-step instructions on how to do it.

Comments closed

SSIS Integration Runtimes in Synapse

Andy Leonard heard it on the grape vine:

My first response was – and I quote – “WOO HOO!” It’s good to see SSIS getting some love.

A couple years ago, someone claimed SSIS was dying. I first checked it out. Then I blogged about it in a post titled SSIS is Not Dead (Or Dying). It’s been a couple years and SSIS is not dead. One could say SSIS functionality being added to Azure Synapse, arguably Azure’s flagship offering, appears to be the opposite of dying.

I’m not sure I’m as sanguine as Andy is about the future of SSIS but I will say at the very least I agree that it’s not going anywhere anytime soon.

Comments closed

Multivariate Anomaly Detection in SynapseML

Louise Han has an announcement:

Today, we are excited to announce a wonderful collaborated feature between Multivariate Anomaly Detector and  SynapseML , which joined together to provide a solution for developers and customers to do multivariate anomaly detection in Synapse. This new capability allows you to detect anomalies quickly and easily in very large datasets and databases, perfectly lighting up scenarios like equipment predictive maintenance. For those who is not familiar with predictive maintenance, it is a technique that uses data analysis tools and techniques to detect anomalies in the operation and possible defects in equipment and processes so customers can fix them before they result in failure. Therefore, this new capability will benefit customers who have a huge number of sensor data within hundreds of pieces of equipment, to do equipment monitor, anomaly detection, and even root cause analysis.

Click through for more details and a demonstration on how to use it.

Comments closed

From Cosmos DB to Dedicated SQL Pools via Synapse Link

Jovan Popovic shows off Azure Synapse Link:

At the time of writing this article, the dedicated SQL pool doesn’t have the ability to read data from CosmosDB/Dataverse using the Synapse link. There are scenarios where you would need to use CosmosDB data in your dedicated SQL pool, so you would need to find a way how to load data. In theory, you could create an ADF pipeline that reads data from CosmosDB or Dataverse and store data in the dedicated SQL pool as a target. This might be a problem if your Pipeline is reading data directly from CosmosDB account because it might impact both operational workload performance and cost. The analytical storage is the recommended location that you should use to fetch all data from CosmosDB/Dataverse.

In this post, I will describe how to use a two-step approach where you export your data using the serverless SQL pool via Synapse link into Azure Data Lake storage, and then load data into the dedicated SQL pool table. This process is shown in the following figure:

A couple of weeks back, I wrote about another method of doing this through the Spark pool. Now you have two options.

Comments closed