Press "Enter" to skip to content

Category: Synapse Analytics

GUID Conversion and the Serverless SQL Pool

Reitse Eskens hits a weird error:

One of the transformations is to change one primary key column from integer to GUID. This is something you can do with some trickery you’ll see in the code. But what I found was that, even though the primary key is unique, the GUID’s weren’t. And then the fun starts digging into the why…

Read on for the research Reitse performed. I don’t even have a good guess for this, it’s so weird. It feels like a bug but it’s weird regardless.

Comments closed

The Death (and Life?) of Azure Synapse Analytics

Paul Andrew plays coroner:

I think it’s fair to say that Azure Synapse Analytics has had a hard life. It was announced in public preview as a surprise to most of the community, including Microsoft cloud solution architects. Ultimately meaning that very little private preview testing and feedback on the product was done before showing it to the world. This resulted in a lot of frustration in the subsequent year before it could be classified as generally available and more frustration after that while we battled with the missing production features. Even now, the product is lacking in a lot of functionality. Anyway, this is all in the past. Microsoft Fabric is the new kid on the block, and we need to address the unpopular question about the future of Synapse. And considering I’ve been very unpopular with the product teams before; I’ll take this one for the team. Sorry, but it needs to be addressed.

Read on for Paul’s thoughts. I tend to agree in general with his take, but do read Bogdan Crivat’s response. Bogdan is on the Synapse product team and shares some thoughts as well.


Network Troubleshooting for Azure Synapse Analytics

Sergio Fonseca continues a series on Azure Synapse Analytics connectivity problems:

In this post I will speak about how to capture a network trace and how to do some basic troubleshooting using Wireshark to investigate connection and disconnection issues, not limited to samples error messages below:

  • An existing connection was forcibly closed by the remote host, The specified network name is no longer available, The semaphore timeout period has expired.
  • Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was – [Pre-Login] initialization=5895; handshake=29;
  • A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)
  • A connection was successfully established with the server, but then an error occurred during the login process
  • Failed to copy to SQL Data Warehouse from blob storage. A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.) An existing connection was forcibly closed by the remote host
Comments closed

Power BI Authentication to Synapse via Sharable Cloud Connection

Dan English continues a series:

This is a bit overdue and a follow up to a few other posts I have regarding using Service Principal authentication with Power BI reports Power BI using Service Principal with Synapse SQL Pool and Power BI using Service Principal with Synapse Data Explorer (Kusto) Pool.

With the other two posts I did last year I had to use the SQL Server ODBC driver to get that to work and the big downside to that is that you need to use a gateway with that. Well in this case we are going to take a look at the new Shareable Cloud Connections that were announced earlier this year Streamlining cloud connection management for datasets, paginated reports, and other artifacts | Microsoft Power BI Blog | Microsoft Power BI

Click through to see what you need to get it working.

Comments closed

Exporting Dynamics 365 Data into Delta Lake via Synapse Link

Jose Mendes performs a data migration:

It’s fair to say there have been some considerable changes in the Azure landscape over recent years.

This blog will show you how to configure Synapse Link to export D365 data in the Delta Lake format – an open-source data and transaction storage file format used in Lakehouse implementations.

Before you start considering using this approach, you will need to ensure you meet the following prerequisites (Microsoft documentation).

Read on for those prerequisites as well as a step-by-step guide on how to do it.

Comments closed

CI/CD for Synapse Serverless SQL Pool with SqlPackage and Azure DevOps

Rui Cunha has a tutorial for us:

Azure Synapse Analytics Serverless SQL is a query service mostly used over the data in your data lake, for data discovery, transformation, and exploration purposes. It is, therefore, normal to find in a Synapse Serverless SQL pool many objects referencing external locations,  using disparate external data sources, authentication mechanisms, file formats, etc. In the context of CICD,  where automated processes are responsible for propagating the database code across environments, one can take advantage of database oriented tools like SSDT and SqlPackage CLI , ensuring that this code is conformed with the targeted resources.

In this article I will demonstrate how you can take advantage of thee tools when implementing the CICD for the Azure Synapse Serverless SQL engine. We will leverage SQL projects in SSDT to define our objects and implement deploy-time variables (SQLCMD variables).  Through CICD pipelines, we will build the SQL project to a dacpac artifact, which enables us to deploy the database objects one or many times with automation.

Click through for the demonstration.

Comments closed

Contrasting Azure Synapse Analytics and Microsoft Fabric

Warner Chaves explains the difference:

In the modern era of data-driven decision-making, businesses rely heavily on robust and efficient data platforms to process, analyze, and derive insights from their vast amounts of data. Since 2019, Azure Synapse Analytics has been Microsoft’s main contender in this space, offering powerful capabilities to handle complex data workloads.

Now, Microsoft has announced a new data platform called Microsoft Fabric, an evolution of the data platform built with a modified philosophy. It is a similar product but with enough differences to make them not interchangeable and so it’s very important to understand how they both compare and contrast if you’re planning a new data platform deployment. Microsoft wanted a product that was even simpler to deploy and operate and could function well outside of an Azure cloud environment as a full standalone Software As a Service offering.

In this blog post, we’ll compare Synapse Analytics and Fabric, highlighting their features, strengths, and considerations to help you make an informed decision for your organization’s data needs.

Warner has seven main areas of comparison, so click through to see how the two products stack up.

Comments closed

Against Waiting for Microsoft Fabric

Paul Andrew follows Betteridge’s Law of Headlines:

But, lets prepare for it in terms of the technical capabilities we line up in our existing data architecture.

The hype curve around Microsoft Fabric since its announcement earlier in the year has been huge. The problem is, we now face some difficult questions in terms of our technology estate. Especially if we have designs and a project already in flight using other Azure Resources.

Read on for Paul’s thoughts on the matter and why you shouldn’t wait until Microsoft Fabric is officially out—use what is available in the meantime and then decide whether you want to make a transition. Paul leaves one thing in the margins that I would want to make clear: if this is your plan, avoid the dedicated SQL pool unless you absolutely need it or plan to stay on Synapse once Fabric is GA.

Comments closed

A First-Pass Approach to Migrating Dedicated SQL Pool Schemas to Fabric

Kevin Chant gets a jump on a big problem:

To manage expectations, this post only covers database schema objects. Plus, I need to highlight the fact that this solution has some interesting quirks. Some of which I highlight in this post.

Even though there are some quirks, I still want to show this solution. So that others can see it working and I can highlight a few important facts. Plus, share a template you can use to test this yourself.

The current lack of a good migration strategy is a real challenge for anyone thinking of moving from Azure Synapse Analytics to Microsoft Fabric. Serverless SQL pools and Spark pools are an easy transition, but dedicated SQL pools are a tough nut to crack.

Comments closed

Using the Azure Data Factory Self-Hosted Integration Runtime

Chen Hirsh hosts a runtime:

In Azure data factory (ADF), An integration runtime is a compute resource to run your pipelines on. When you run an application on your computer, it uses the computer resources, such as CPU and memory, to run its tasks. When you run activities in a pipeline in ADF, they also need resources to do their job, like copying data or writing a file, and these are provided by the integration runtime.

When you create an instance of ADF, you get a default integration runtime, hosted in the same region that you created ADF in. If you need, you can add your own integration runtimes, either on Azure, or you can download and install a self-hosted integration runtime (SHIR) on your own server.

Read on to understand when you would want to use a self-hosted integration runtime and the process to do so. This SHIR also applies to Synapse pipelines and is one of the few ways to move data out of a Synapse workspace with data exfiltration protection enabled.

Comments closed