Press "Enter" to skip to content

Category: Synapse Analytics

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….

Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

Azure Synapse Analytics July 2022 Updates

Ryan Majidimehr notes that the Azure Synapse Analytics team has been busy:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases into a dedicated SQL pool in Azure Synapse Analytics. Starting this month, you can make trade-offs between cost and latency in Synapse Link for SQL by selecting the continuous or batch mode to replicate your data.  

By selecting “continuous mode”, the runtime will be running continuously so that any changes applied to the SQL database or SQL Server will be replicated to Synapse with low latency. Alternatively, when you select “batch mode” with a specified interval, the changes applied to the SQL database or SQL Server will be accumulated and replicated to Synapse in batch mode with the specified interval. This can save cost as you are only charged for the time the runtime is required to replicate data. After each batch of data is replicated, the runtime will be shut down automatically. 

Click through for the complete list.

Comments closed

Updating Synapse Linked SQL Servers with Azure DevOps

Kevin Chant makes a change:

This post covers how to update both ends of Azure Synapse Link for SQL Server 2022 using Azure DevOps. As shown at the Data Toboggan conference.

By the end of this post you will know how to deploy database updates to both the SQL Server database and the Azure Synapse dedicated SQL Pool that are used as part of Azure Synapse Link for SQL Server 2022, using a pipeline in Azure DevOps. To keep them consistent.

Click through for the process.

Comments closed

Power BI and Synapse Book Roundup

Chris Webb checks out some books:

I like free stuff and I like books, so of course I like free books – and it seems that the more I provide free publicity for relevant books here the more free books I get sent. I’ve now got enough to merit writing another post covering those I’ve received recently from various publishers and authors. As always these are not reviews, just short summaries of books you might want to check out.

Read on for the not-reviews.

Comments closed

Pre-Processing Data Explorer Data with Spark

Hauke Mallow does some data engineering:

We often see customer scenarios where historical data has to be migrated to Azure Data Explorer (ADX). Although ADX has very powerful data-transformation capabilities via update policies, sometimes more or less complex data engineering tasks must be done upfront. This happens if the original data structure is too complex or just single data elements being too big, hitting data explorer limits of dynamic columns of 1 MB or maximum ingest file-size of 1 GB for uncompressed data (see also Comparing ingestion methods and tools) .

Let’s think about an Industrial Internet-of-Things (IIoT) use-case where you get data from several production lines. In the production line several devices read humidity, pressure, etc. The following example shows a scenario where a one-to-many relationship is implemented within an array. With this you might get very large columns (with millions of device readings per production line) that might exceed the limit of 1 MB in Azure Data Explorer for dynamic columns. In this case you need to do some pre-processing.

Click through to see how you can do this with an Azure Synapse Analytics Spark pool prior to ingesting it with a Data Explorer pool.

Comments closed

CETAS and the Serverless SQL Pool

Liliam Leme takes us through the Create External Table as Select operation in the Azure Synapse Analytics serverless SQL pool:

Serverless SQL pool  has a very cool way to handle CREATE EXTERNAL TABLE AS SELECT (CETAS).  I mean, you can use a serverless SQL pool to create a CETAS which will materialize the query results. It means a heavy query in which, for example, the results would be part of future large join with other queries or aggregations that can be consolidated for reporting purposes. Those scenarios are examples that could be used as a CETAS. CETAS will be stored as a parquet file accessed by an external table in your storage and the performance is awesome. Later I am suggesting how to recreate the CETAS using pipeline against serverless SQL Pool.

Click through to see it in action.

Comments closed

Mounting Data Lake Storage from a Spark Pool

Kamil Nowinski runs into some trouble:

Last weekend, I played a bit with Azure Synapse from a way of mounting Azure Data Lake Storage (ADLS) Gen2 in Synapse notebook within API in the Microsoft Spark Utilities (MSSparkUtils) package. I wanted to just do a simple test, hence I followed the documentation from Microsoft: How to use file mount/unmount API in Synapse.
Having an ADLS Account already created in a subscription – should be easy peasy, right?

Read on to understand when things might be a little more complicated than they seem. And more frustrating, once you see the cause of the problem.

Comments closed

Working with Synapse Link for SQL

Steve Howard gives us an overview of a preview:

Azure Synapse Link for SQL greatly simplifies analytics pipelines as Microsoft manages the orchestration process for you. Since being announced at Microsoft Build, many of you have had the opportunity to try it out in a POC so now seems like a good time to take a deep dive on some implementation aspects that may save you time later.

This blog post will assume you have experience with Synapse dedicated SQL pools and that you have some basic working knowledge of Synapse Link for SQL from following the quick start or from doing an initial POC.

Looking at this, I am a bit concerned about what it means to sync actively changing tables, especially ones large enough to benefit from being in a dedicated SQL pool. “Just reload all the data” may be the right answer but it doesn’t sound like a convenient one.

Comments closed

Searching Industry Templates for Lake Databases in Synapse

Lakshmi Murthy is just browsing:

With Azure Synapse Database Templates generally available, our customers are constantly wanting to see and learn more about how to use these templates. Through these blogs we want to share tips and tricks our customers can use to help them utilize these templates in an efficient way. We’ve recently received several questions around the different ways a user can navigate these templates to create their lake databases. In this blog, I’d like to walk through a few options that may come handy as you give database templates a try.

Azure Synapse Analytics offers a no-code database designer which allows you to browse these database templates, select and customize the tables you want to use, to model your enterprise data. There are several ways to browse the tables provided by the comprehensive industry templates within the designer’s exploration experience. Though the user experience is super intuitive, there are a few tips and tricks that can make this process even easier. Let’s do a quick tour to learn about the different ways to browse these templates.

Click through for a few different ways to look at standard tables for different industries.

Comments closed

Azure Synapse Analytics June 2022 Updates

Ryan Majidimehr has some updates for us:

Fuzzy matching with a sliding similarity score option has been added to the Join transformation in Mapping Data Flows. You can create inner and outer joins on data values that are similar rather than exact matches! Previously, you would have had to use an exact match. The sliding scale value goes from 60% to 100%, making it easy to adjust the similarity threshold of the match. 

Read on for the full list of updates.

Comments closed