Press "Enter" to skip to content

Category: Cloud

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….

Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

From Azure Data Explorer to Excel

Dany Hoter views data in Excel:

In a previous article Direct Query from Excel to Azure Data Explorer (microsoft.com) I described a way to mimic Direct Query access ala Power BI in Excel.

The method used in this article that allows the user to filter the imported data using values entered into cells in the grid.

In this article I would like to describe a way to really query Kusto data in real time without importing any data and without any volume limitations.

Read on to see how, though there’s a pretty big intermediate step.

Comments closed

Organizing Data Domains in a Data Mesh

Paul Andrew continues a series on data mesh architecture:

Defining an organisation hierarchy is always hard, even more so for large enterprises with massive amounts of interlock between business functions. In the context of data analytics, we attempt to tackle the problem by creating an organisation dimension as part of our star schema data model. This could include things like region, operating company, branch, department, team etc.

So, my friends, how do we go about handling this when considering a data mesh architecture and the de-centralised domains that support the natural scalability we crave. For me, it feels like we are just frontloading the dimensional modelling problem. Tackling it from the beginning in the very foundations of our data platform. But, with a twist.

Read on for that twist and for some solid guidance on data domains in practice compared to the theory.

Comments closed

Azure Synapse Analytics July 2022 Updates

Ryan Majidimehr notes that the Azure Synapse Analytics team has been busy:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases into a dedicated SQL pool in Azure Synapse Analytics. Starting this month, you can make trade-offs between cost and latency in Synapse Link for SQL by selecting the continuous or batch mode to replicate your data.  

By selecting “continuous mode”, the runtime will be running continuously so that any changes applied to the SQL database or SQL Server will be replicated to Synapse with low latency. Alternatively, when you select “batch mode” with a specified interval, the changes applied to the SQL database or SQL Server will be accumulated and replicated to Synapse in batch mode with the specified interval. This can save cost as you are only charged for the time the runtime is required to replicate data. After each batch of data is replicated, the runtime will be shut down automatically. 

Click through for the complete list.

Comments closed

Bidirectional Transactional Replication and Managed Instances

Holger Linke builds a transactional replication topology with a couple of twists:

Bidirectional transactional replication is a specific Transactional Replication topology that allows two SQL Server instances or databases to replicate changes to each other. Each of the two databases publishes data and then subscribes to a publication with the same data from the other database. The “@loopback_detection” feature ensures that changes are only sent to the Subscriber and do not result in the changes being sent back to the Publisher.

The databases that are providing the publication/subscription pairs can be hosted either on the same SQL instance or on two different SQL instances. The SQL instances can either be SQL Server on-premise, SQL Server hosted in a Virtual Machine, SQL Managed Instance on Azure, or a combination of each. You just have to make sure that the instances can connect to each other. If you add a subscription by using the fully-qualified domain name (FQDN), verify that the server name (@@SERVERNAME) of the Subscriber returns the FQDN. If the Subscriber server name does not return the FQDN, changes that originate from that Subscriber may cause primary key violations.

Read on for the scripts.

Comments closed

“Warming Up” Databricks Clusters

Ust Oldfield needs that cluster to be up:

Interactive and SQL Warehouse (formerly known as SQL Endpoint) clusters take time to become active. This can range from around 5 mins through to almost 10 mins. For some workloads and users, this waiting time can be frustrating if not unacceptable.
For this use case, we had streaming clusters that needed to be available for when streams started at 07:00 and to be turned off when streams stopped being sent at 21:00. Similarly, there was also need from business users for their SQL Warehouse clusters to be available for when business started trading so that their BI reports didn’t timeout waiting for the clusters to start.

Read on to see one way to solve this problem without having a cluster run 24/7.

Comments closed

Breaking Changes in Azure Data Explorer

Gabi Lehner announces a change:

The current_principal_is_member_of() function checks if the principal who runs the query is a member in any of the users, apps or groups provided as arguments.

Up until now, it was allowed to specify the AAD group details in multiple forms, including the display name of the AAD group, without specifying the tenant id or name, for example current_principal_is_member_of(“mygroup”).

I have to say, that’s a pretty big security flaw.

Comments closed

Azure Active Directory Authentication in SQL Server 2022

Mirek Sztajno has an interesting announcement:

Enabling Azure AD authentication opens access to the Azure cloud identity system. Azure AD is used by many cloud services and unifies all local authentication mechanisms used by Microsoft products providing one central identity repository and authentication management system available to different platforms, including Azure SQL and SQL Server on-premises. The variety of available authentication methods including single sign-on (SSO) and multifactor authentication (MFA), provides strong security support in the authentication area for different services used internally by Microsoft and by external customers. Azure AD authentication is the recommended authentication method for Azure SQL and SQL Server.

Looks like it does require Azure Arc, which has a fairly small per-instance monthly charge. Click through for the details. That said, you will be able to use this feature on-premises and in other clouds, not just in Azure VMs.

Comments closed

Creating a Snowflake Instance

Arun Sirpal sets up Snowflake:

Now let’s start the process of creating a snowflake account in the Azure Cloud. You can sign up for a free trial from here – https://signup.snowflake.com/ I am going to bypass this and go straight to the setup screens. (This is slightly different because as an org-admin I have the power to create accounts)

Select the cloud provider and edition you require; we have already discussed these options before. You know me, its going to be Azure but feel free to dive into AWS or GCP.

Read on for some step-by-step installation instructions.

Comments closed

Cross-Subscription Key Vault Access

Andrew Coughlin sets up secure Key Vault access:

Let’s first discuss the setup of what we will be discussing in this blog post.  I will have two subscriptions assigned to the same Azure AD Tenant.   Within each Azure subscription I will have a resource group in each.  I will create the Azure Key Vault in one subscription / resource group and then I will create a virtual machine in the other subscription / resource group.  This is just for example purposes; I could utilize other azure services that can use managed identities.   I could also create a service principal for my application to use to get keys or secrets.

In this example we would be using private endpoints.  Are you looking for how to do this with public endpoints?  Check out my recent post on how to do that here .

When in doubt, private endpoints are the right choice. They’re probably the right choice when not in doubt as well.

Comments closed