Press "Enter" to skip to content

Category: Synapse Analytics

Azure Synapse Database Templates

Aaron Merrill announces database templates for Azure Synapse Analytics:

The Synapse database template for Agriculture is a comprehensive data model that addresses the typical data requirements of organizations engaged in growing crops, raising livestock, and producing dairy products, including field and pasture management and satellite and drone data.

The Synapse database template for Energy & Commodity Trading is a comprehensive data model that addresses the typical data requirements of organizations engaged in trading energy, commodities, and/or carbon credits, whether as a primary trading business or in support of their supply chains, operating businesses, and hedging activities.

You may remember Microsoft buying ADRM Software a while back. This is why.

Comments closed

Serverless SQL Pool CI/CD via GitHub Actions

Kevin Chant reminds me I need to spend more time with GitHub Actions:

I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using GitHub Actions in this post. For various reasons.

For a start, in a previous post I wrote about how you can CI/CD for serverless SQL pools using Azure DevOps. So, I thought I would balance things out and show how you can do the same thing within GitHub.

In addition to this, there have been a few discussions about using GitHub Actions instead of Azure Pipelines within the Microsoft Data Platform community recently. For example, the topic came up during the DataWeekender conference.

With this in mind, I want to show how easy it can be to migrate an Azure DevOps pipeline to GitHub Actions.

Click through for the example.

Comments closed

Updates to Azure Synapse Link

Aria Jelinek outlines the value of Azure Synapse Link:

New as of Ignite 2021, customers can optimize queries by setting custom partitions for their Azure Cosmos DB analytical store using keys that are commonly used as query filters. This compacts and optimizes the analytical data written to the partitioned store, resulting in better query performance even when working with a high volume of update or delete operations.

Azure Synapse Link is also now available for Azure Cosmos DB serverless accounts, expanding the integration to cover data from workloads with bursts of traffic or uncertain traffic patterns.

This post mostly covers the Dataverse and Cosmos DB integrations rather than the integration with SQL Server 2022.

One the whole, I like Azure Synapse Link for Cosmos DB and will probably like it for SQL Server 2022—maybe even a bit more. It does simplify the ELT process by taking care of the E and handling the first half of the L (landing into a staging table). Though if data’s going into a dedicated SQL pool, I do hope the people doing this understand that dedicated SQL pools are intended for Kimball-style data warehousing scenarios and there can be a considerable performance (and therefore price) hit if you simply replicate a bunch of stuff without subsequent transformation.

Comments closed

Querying Delta Lake via Azure Synapse Analytics Serverless SQL Pool

Tony Truong uses T -SQL to query Delta Lake files:


How to query Delta Lake with SQL on Azure Synapse  

As mentioned earlier, Azure Synapse has several compute pools for the evolving analytical workload. There is the Apache Spark pool for data engineers and serverless SQL pool for analysts. Let us break down how the two personas work together to query a shared Delta Lake.  

Read on for the setup and the payoff.

Comments closed

ML Updates in Azure Synapse Analytics

Aria Jelinek and Nellie Gustafsson have some announcements for us:

Announced last week at Ignite 2021, data teams now have a handful of new opportunities to drive value with machine learning built directly into their Apache Spark pools in Azure Synapse Analytics.

With the general availability of our machine learning library for Apache Spark on Azure Synapse, data teams now have expanded access to both code-first and code-free ML tools for forecasting, model training, and pre-built AI. This library provides both familiar open-source tools such as LightGBM as well as proprietary solutions to provide a comprehensive, streamlined approach to ML workloads. Updates include PREDICT, a new keyword that supports scoring AzureML and MLFlow models directly in Azure Synapse, and integration with Azure Cognitive Services, now generally available.

Click through for all of the announcements.

1 Comment

Lessons Learned from the Serverless SQL Pool

Teo Lachev has some thoughts for us:

I’ve architected and currently implementing a solution that uses Synapse (my last newsletter has the details, plus the architecture diagram). Synapse Serverless is the Microsoft answer to Amazon Athena but instead of using open-source tools like Presto, it’s built on SQL Server. In this project we extract many tables from 1,500 on-prem SQL Server databases and stage them in ADLS.

Read on for Teo’s notes on the topic.

Comments closed

Azure Synapse Analytics Database Templates

Santosh Balasubramanian shows off database templates in Azure Synapse Analytics:

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.

One of the challenges that users in key industry areas face is how to describe and shape the mass of data that they are gathering. Most of this data is currently stored in data lakes or in application-specific data silos. The challenge is to bring all this data together in a standardized format enabling it to be more easily analyzed and understood and for ML and AI to be applied to it.

Azure Synapse solves this problem by introducing industry-specific templates for your data, providing a standardized way to store and shape data. These templates provide schemas for predefined business areas, enabling data to be loaded into a database in a structured way.

Read on to see what they can do, and try them out in a Synapse workspace.

Comments closed

Azure Synapse Analytics Shared Security

Hiram Fleitas explains the value of workspace and storage account segregation in Azure Synapse Analytics:

Well, why?… perhaps you prefer not spinning more resources to segmentate the environment or decouple the workloads, but you still need to enforce data security across domains.

Lets look at how to secure an HR container in a shared Azure Synapse Analytics workspace that serves mixed workloads by using only RBAC permissions at the storage, and at container level.

It’s recommended to use a separate storage account. I will explain and demo why.

Click through for the demo and explanation.

Comments closed

Azure Synapse Data Explorer Pools

Manoj Raheja tries announces another pool type:

At Ignite, we announced the public preview of Azure Synapse data explorer that makes it possible to query huge amounts of structured, semi-structured, and free-text telemetry and time-series data. The following are some of the key capabilities that make this possible:

– Powerful distributed query engine that indexes all data including free text and semi-structured data. The data is automatically compressed, indexed, auto-optimized, and cached on local SSDs and persisted on storage. Compute and storage are decoupled that gives you full elasticity to auto scale in/out without a downtime.

– Intuitive Kusto Query Language (KQL) that is highly optimized for exploring raw telemetry and time series data using Synapse data explore’s best-in-class text indexing for efficient free-text search, regex, and parsing on traces\text data.

– Comprehensive JSON parsing capabilities for querying semi-structured data including arrays and nested structure.

– Native, advanced time series support for creation, manipulation, and analysis of multiple time series with in-engine Python and R execution support for model scoring.

Click through for a demonstration, showing that this is for more than just logs.

Comments closed

Starting a Synapse Proof of Concept

Hope Foley shares a secret with us:

I love my job!  One of the things I do for a living is to help customers get started with new services in Azure to finagle their data.  Many times we’ll start with a small POC to just start to understand the parts and pieces, and I teach them along the way.  I work with a lot of customers so being quick and nimble helps.  Lately I’ve been using PowerShell to setup the pieces needed for a full Synapse Analytics environment, including an example set of 4 pipelines (2 to extract to ADLS, 2 to upload to dedicated SQL pool).  Pulling data out of large relational databases into the data lake became a request I heard over and over so I automated it.  I’ve added and tweaked this over the years into a project I called “Synapse Load” and put a version out in my github. 

Click through to see what this includes and how you can use it.

Comments closed