Press "Enter" to skip to content

Category: Synapse Analytics

REST APIs for Synapse Spark Pools

Abid Nazir Guroo looks at some endpoints:

Azure Synapse Analytics Representational State Transfer (REST) APIs are secure HTTP service endpoints that support creating and managing Azure Synapse resources using Azure Resource Manager and Azure Synapse web endpoints. This article provides instructions on how to setup and use Synapse REST endpoints and describe the Apache Spark Pool operations supported by REST APIs.

Read on to see some of the Spark pool management options are available to you via the REST API.

Comments closed

Time Travel with Delta Tables in Synapse

Liliam Leme reverses the clock:

Scenario

While working with a customer, they had a requirement to restore modified files to a specific point in time. They had built their architecture on top of a Data lake.

Looking for options

While working on this scenario, we explored some storage options available without any side customization, for example, Soft delete for blobs – Azure Storage | Microsoft Docs.

Read on to see what they landed on.

Comments closed

Incorporating Power BI with Azure Synapse Analytics

Ginger Grant counts the ways:

The first is to connect Power BI to Azure Synapse to explore and visualize data. You can examine your datasets that you have loaded in your datalake with Power BI to help with the analysis of the data either for a data science solution or to determine how you are going to transform the data. For more information on how to do this, check out my previous blog .

Click through for three additional methods.

Comments closed

GitHub CI/CD for Synapse Link for SQL Server 2022

Kevin Chant does a bit of CI/CD:

In this post I want to show how a GitHub CI/CD experience for Azure Synapse Link for SQL Server 2022 can look. Which uses GitHub Actions. Including how to automatically stop and start it in the pipeline.

In my last post I showed a complete CI/CD experience for Azure Synapse Link for SQL Server 2022 using Azure DevOps.

With this in mind, in this post I show an alternative GitHub CI/CD experience for Azure Synapse Link for SQL Server 2022 which uses GitHub Actions. Which includes automatically stopping the link before the database update and starting it again after the update has completed.

Read on to learn how.

Comments closed

Installing Third-Party WHL Packages in Synapse with DEP

Sabyasachi Samaddar walks through what I consider a real difficulty:

It is really challenging when you need to install third-party .whl packages into a DEP-enabled Azure Synapse Spark Instance.

According to the documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-librar… Installing packages from PyPI is not supported within DEP-enabled workspaces. Hence we cannot just upload the .whl packages into the workspace. We need to upload all the dependencies along with the .whl package and it will be an offline installation. Now Synapse spark clusters come with in-built packages and hence we may find some conflicts when we try to install some third-party packages.

Read on to see what you need to do.

Comments closed

Azure Synapse Analytics R Language Support

Ryan Majidimehr has a short list of updates for Azure Synapse Analytics but it includes a big one:

Azure Synapse Analytics provides built-in R support for Apache Spark. As part of this, data scientists can leverage Azure Synapse Analytics notebooks to write and run their R code. This also includes support for SparkR and SparklyR, which allows users to interact with Spark using familiar Spark or R interfaces. To learn more read the official how-to Use R for Apache Spark with Azure Synapse Analytics (Preview).

That it took this long for R support was a bit weird, but I’m glad it’s there now.

Comments closed

Choosing between Synapse Spark Notebooks or Job Definitions

Arun Sethia and Arshad Ali explain when you might use a Spark notebook versus a job definition:

Synapse Spark Notebook is a web-based (HTTP/HTTPS) interactive interface to create files that contain live code, narrative text, and visualizes output with rich libraries for spark based applications. Data engineers can collaborate, schedule, run, and test their spark application code using Notebooks. Notebooks are a good place to validate ideas and do quick experiments to get insight into the data. You can integrate the Synapse Notebook into Synapse pipeline.

The Notebook allows you to combine programming code with markdown text and perform simple visualizations (using Synapse Notebook chart options and open-source libraries). In addition, running code will supply immediate feedback, output, and progress tracking within Notebook.

Click through for the comparison.

Comments closed

Monitoring the Serverless SQL Pool via Log Analytics

Sidney Cirqueira shows how to monitor SQL requests in Azure Synapse Analytics:

Today I would like to share a scenario that I was working on one of my serverless SQL Pool support cases. The customer asked for an advice on how to monitor serverless SQL requests by using log analytics.

The intention of this guide is to help you with choosing the configuration required to easily setup the Synapse Analytics Workspace monitoring and all other considerations about how to monitor serverless SQL requests with Azure Monitor. Spoiler: At the end of this article, I will share the latest version of the serverless workbook posted on the Azure_Synapse_Tool_Box. This includes a really cool way to see query execution information.

Read on for that and definitely check out the Azure Synapse Toolbox if you’re a Synapse user.

Comments closed

Transferring Data between Dedicated SQL and Spark Pools in Synapse

Sidney Cirqueria shows off a connector available to us in Azure Synapse Analytics:

Usually, customers do this kind of operation using Synapse Apache Spark to load data to Dedicated Pool within Azure Synapse Workspace, but today, I would like to reproduce a different scenario that I was working on one of my support cases.  Consider a scenario where you are trying to load data from Synapse Spark to Dedicated pool (formerly SQL DW) using Synapse Pipelines, and additionally you are using Synapse Workspace deployed with Managed Virtual Network.

The intention of this guide is to help you with which configuration will be required if you need to load data from Azure Synapse Apache Spark to Dedicated SQL Pool (formerly SQL DW). If you prefer take advantage of the new feature-rich capabilities now available via the Synapse workspace and Studio and load data directly from Azure Apache Spark to Dedicated Pool in Azure Synapse Workspace is recommended that you enable Synapse workspace features on an existing dedicated SQL pool (formerly SQL DW).

Read on for a few tips a nd a step-by-step walkthrough of the process.

Comments closed

Replicated Tables in Dedicated SQL Pools

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?  

This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types: 

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed