Press "Enter" to skip to content

Category: Synapse Analytics

Query Performance Insights on the Serverless SQL Pool

Jovan Popovic shows how you can use the QPI library on an Azure Synapse Analytics serverless SQL pool:

You can find more of the best practices here. These best practices are very important because some issues might cause performance degradation. You might be surprised how applying some of these best practices might improve the performance of your workload.

The last item that is related to schema optimization is sometimes hard to check. You would need to look at your schema, inspect all columns and find what to optimize. If you have a large schema, this might not be an easy task. But you can make your life easier if you use the QPI helper library that can detect schema issues for you.

Read on to see what it can find.

Comments closed

Building posexplode() in the Serverless SQL Pool

Jovan Popvic rides to the rescue with JSON:

The array cells are pivoted and returned as simple scalar columns. Now you can simply use WHERE or GROUP BY clauses to filter or summarize information by array element values. Another very useful piece of information might be the index of every element (generated as pos column).

Spark enables you to use the posexplode() function on every array cell. The posexplode() function will transform a single array element into a set of rows where each row represents one value in the array and the index of that array element. As a result, one row with the array containing three elements will be transformed into three rows containing scalar cells. This flattened/normalized representation is much easier for the analysis.

Once the array is flattened and normalized, you can easily analyze the data and find how much people knowing SQL or Java.

Read on to see how you can implement the equivalent of POSEXPLODE() using OPENJSON() in the Azure Synapse Analytics serverless SQL pool.

Comments closed

Executing SQL Statements in Azure Data Factory

Abhishek Narain announces a pretty nice improvement to Azure Data Factory and Synapse Pipelines:

We are introducing a Script activity in pipelines that provide the ability to execute single or multiple SQL statements.  

Using the script activity, you can execute common operations with Data Manipulation Language (DML), and Data Definition Language (DDL). DML statements like SELECT, UPDATE, and INSERT let users retrieve, store, modify, delete, insert and update data in the database. DDL statements like CREATE, ALTER, and DROP allow a database manager to create, modify, and remove database objects such as tables, indexes, and users.

Be sure to read the limitations at the bottom, however.

Comments closed

Delta Lake Operability in Azure Synapse Analytics

James Serra lets us know when and where we can use Delta Lake within Azure Synapse Analytics:

Many companies are seeing the value in collecting data to help them make better business decisions. When building a solution in Azure to collect the data, nearly everyone is using a data lake. A majority of those are also using delta lake, which is basically a software layer over a data lake that gives additional features. I have yet to see anyone using competing technologies to delta lake in Azure, such as Apache Hudi or Apache Iceberg (see A Thorough Comparison of Delta Lake, Iceberg and Hudi and Open Source Data Lake Table Formats: Evaluating Current Interest and Rate of Adoption).

Read on for more information.

Comments closed

Scoring Azure ML Models in Azure Synapse Analytics

Alex Aleksandrov shows off the PREDICT operator:

We can use Synapse for many activities. We can use it not only for ingesting, querying, storing and visualising data, but for developing machine learning models as well. Of course, one can say that doing data science is another functionality of this platform and this is definitely true. However, in this article, I would like to show you that instead of using Python, one can use T-SQL for doing predictions.

Click through to see how.

Comments closed

Creating an SSIS Integration Runtime in Synapse

Andy Leonard shows one way to create an Azure * SQL Server Integration Services integration runtime for Azure Synapse Analytics:

On 17 Feb 2022, I first saw the Microsoft announcement of the public preview of Azure-SSIS integration runtimes in Azure Synapse Analytics. I blogged about the announcement in a post titled Azure-SSIS Integration Runtime now available in Azure Synapse Analytics.

I am excited to share one way for you to provision an Azure-SSIS IR in Synapse Analytics, following these steps. To start provisioning a shiny new Azure Synapse Analytics Azure-SSIS integration runtime, open Synapse Studio:

Read on for the step-by-step guide.

Comments closed

Deploying an Azure Synapse Analytics Workspace

Rajendra Gupta builds out an Azure Synapse Analytics workspace:

In the article, An Overview of the Azure Synapse Analytics, we explored the Azure Synapse workspace and its features as an analytics service combining Big data analytics and enterprise data warehousing.

This article is a practical demonstration of deploying Azure Synapse Analytics workspace using the Azure portal.

Click through for step-by-step instructions on how to do it.

Comments closed

SSIS Integration Runtimes in Synapse

Andy Leonard heard it on the grape vine:

My first response was – and I quote – “WOO HOO!” It’s good to see SSIS getting some love.

A couple years ago, someone claimed SSIS was dying. I first checked it out. Then I blogged about it in a post titled SSIS is Not Dead (Or Dying). It’s been a couple years and SSIS is not dead. One could say SSIS functionality being added to Azure Synapse, arguably Azure’s flagship offering, appears to be the opposite of dying.

I’m not sure I’m as sanguine as Andy is about the future of SSIS but I will say at the very least I agree that it’s not going anywhere anytime soon.

Comments closed

Multivariate Anomaly Detection in SynapseML

Louise Han has an announcement:

Today, we are excited to announce a wonderful collaborated feature between Multivariate Anomaly Detector and  SynapseML , which joined together to provide a solution for developers and customers to do multivariate anomaly detection in Synapse. This new capability allows you to detect anomalies quickly and easily in very large datasets and databases, perfectly lighting up scenarios like equipment predictive maintenance. For those who is not familiar with predictive maintenance, it is a technique that uses data analysis tools and techniques to detect anomalies in the operation and possible defects in equipment and processes so customers can fix them before they result in failure. Therefore, this new capability will benefit customers who have a huge number of sensor data within hundreds of pieces of equipment, to do equipment monitor, anomaly detection, and even root cause analysis.

Click through for more details and a demonstration on how to use it.

Comments closed