Synapse Analytics – Page 7

Data Exfiltration Protection and Synapse Pipelines

Published 2022-12-02 by Kevin Feasel

Before we discuss how DEP applies to Synapse Pipelines, it is important to level-set on some Synapse Pipelines specific concepts – if you are familiar with Synapse Pipelines or Azure Data Factory you can skip over this section and jump to Synapse Pipeline connectivity without DEP enabled.

For a more generalized introduction to Synapse Pipelines check out this doc article.

Synapse Pipelines enables users to connect to a range of different data services, through what is called a Linked Service.

The big trick, using self-hosted integration runtimes, is something Luke spends a fair amount of time on.

Comments closed

Azure Synapse Analytics Updates for November 2022

Published 2022-12-01 by Kevin Feasel

Ryan Majidimehr has a bundle of updates for us:

We are always working to improve Azure Analytics Spark performance. We are making significant changes that will increase Spark performance by up to 77%. 

Based on our testing using the 1TB TPC-H industry benchmark, you’re likely to see up to 77% increased performance. While your workload may perform differently than the TPC-H benchmark, everyone is expected to see improved performance. These Spark performance improvements come from moving to the latest Azure v5 VMs which have improved CPU performance, increased temporary SSD throughput, and lastly higher remote storage IOPS. 

Click through for a whole bunch of updates.

Comments closed

Reading Serverless SQL Pool Data with Data Factory

Published 2022-11-28 by Kevin Feasel

Koen Verbeeck wants to read from the serverless SQL pool in Azure Synapse Analytics:

We have some data we can query using the serverless SQL pools in Azure Synapse Analytics. For this blog post, I’m querying data that is stored in Azure Cosmos DB. Read the blog post How to Store Normalized SQL Server Data into Azure Cosmos DB to learn more about how that data got there.

Suppose I now want to read the data using Azure Data Factory. You can read data from Cosmos DB directly, but let’s pretend I want to do some transformations first using my favorite language: SQL. How can we do this?

Read on to learn how.

Comments closed

REST APIs for Synapse Spark Pools

Published 2022-11-23 by Kevin Feasel

Abid Nazir Guroo looks at some endpoints:

Azure Synapse Analytics Representational State Transfer (REST) APIs are secure HTTP service endpoints that support creating and managing Azure Synapse resources using Azure Resource Manager and Azure Synapse web endpoints. This article provides instructions on how to setup and use Synapse REST endpoints and describe the Apache Spark Pool operations supported by REST APIs.

Read on to see some of the Spark pool management options are available to you via the REST API.

Comments closed

Time Travel with Delta Tables in Synapse

Published 2022-11-11 by Kevin Feasel

Liliam Leme reverses the clock:

Scenario

While working with a customer, they had a requirement to restore modified files to a specific point in time. They had built their architecture on top of a Data lake.

Looking for options

While working on this scenario, we explored some storage options available without any side customization, for example, Soft delete for blobs – Azure Storage | Microsoft Docs.

Read on to see what they landed on.

Comments closed

Incorporating Power BI with Azure Synapse Analytics

Published 2022-11-04 by Kevin Feasel

Ginger Grant counts the ways:

The first is to connect Power BI to Azure Synapse to explore and visualize data. You can examine your datasets that you have loaded in your datalake with Power BI to help with the analysis of the data either for a data science solution or to determine how you are going to transform the data. For more information on how to do this, check out my previous blog .

Click through for three additional methods.

Comments closed

GitHub CI/CD for Synapse Link for SQL Server 2022

Published 2022-11-03 by Kevin Feasel

Kevin Chant does a bit of CI/CD:

In this post I want to show how a GitHub CI/CD experience for Azure Synapse Link for SQL Server 2022 can look. Which uses GitHub Actions. Including how to automatically stop and start it in the pipeline.
In my last post I showed a complete CI/CD experience for Azure Synapse Link for SQL Server 2022 using Azure DevOps.
With this in mind, in this post I show an alternative GitHub CI/CD experience for Azure Synapse Link for SQL Server 2022 which uses GitHub Actions. Which includes automatically stopping the link before the database update and starting it again after the update has completed.

Read on to learn how.

Comments closed

Installing Third-Party WHL Packages in Synapse with DEP

Published 2022-11-02 by Kevin Feasel

Sabyasachi Samaddar walks through what I consider a real difficulty:

It is really challenging when you need to install third-party .whl packages into a DEP-enabled Azure Synapse Spark Instance.
According to the documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-librar… Installing packages from PyPI is not supported within DEP-enabled workspaces. Hence we cannot just upload the .whl packages into the workspace. We need to upload all the dependencies along with the .whl package and it will be an offline installation. Now Synapse spark clusters come with in-built packages and hence we may find some conflicts when we try to install some third-party packages.

Read on to see what you need to do.

Comments closed

Azure Synapse Analytics R Language Support

Published 2022-11-01 by Kevin Feasel

Ryan Majidimehr has a short list of updates for Azure Synapse Analytics but it includes a big one:

Azure Synapse Analytics provides built-in R support for Apache Spark. As part of this, data scientists can leverage Azure Synapse Analytics notebooks to write and run their R code. This also includes support for SparkR and SparklyR, which allows users to interact with Spark using familiar Spark or R interfaces. To learn more read the official how-to Use R for Apache Spark with Azure Synapse Analytics (Preview).

That it took this long for R support was a bit weird, but I’m glad it’s there now.

Comments closed

Choosing between Synapse Spark Notebooks or Job Definitions

Published 2022-10-28 by Kevin Feasel

Arun Sethia and Arshad Ali explain when you might use a Spark notebook versus a job definition:

A Synapse Spark Notebook is a web-based (HTTP/HTTPS) interactive interface to create files that contain live code, narrative text, and visualizes output with rich libraries for spark based applications. Data engineers can collaborate, schedule, run, and test their spark application code using Notebooks. Notebooks are a good place to validate ideas and do quick experiments to get insight into the data. You can integrate the Synapse Notebook into Synapse pipeline.
The Notebook allows you to combine programming code with markdown text and perform simple visualizations (using Synapse Notebook chart options and open-source libraries). In addition, running code will supply immediate feedback, output, and progress tracking within Notebook.

Click through for the comparison.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Synapse Analytics