Press "Enter" to skip to content

Category: Cloud

“Warming Up” Databricks Clusters

Ust Oldfield needs that cluster to be up:

Interactive and SQL Warehouse (formerly known as SQL Endpoint) clusters take time to become active. This can range from around 5 mins through to almost 10 mins. For some workloads and users, this waiting time can be frustrating if not unacceptable.
For this use case, we had streaming clusters that needed to be available for when streams started at 07:00 and to be turned off when streams stopped being sent at 21:00. Similarly, there was also need from business users for their SQL Warehouse clusters to be available for when business started trading so that their BI reports didn’t timeout waiting for the clusters to start.

Read on to see one way to solve this problem without having a cluster run 24/7.

Comments closed

Azure Active Directory Authentication in SQL Server 2022

Mirek Sztajno has an interesting announcement:

Enabling Azure AD authentication opens access to the Azure cloud identity system. Azure AD is used by many cloud services and unifies all local authentication mechanisms used by Microsoft products providing one central identity repository and authentication management system available to different platforms, including Azure SQL and SQL Server on-premises. The variety of available authentication methods including single sign-on (SSO) and multifactor authentication (MFA), provides strong security support in the authentication area for different services used internally by Microsoft and by external customers. Azure AD authentication is the recommended authentication method for Azure SQL and SQL Server.

Looks like it does require Azure Arc, which has a fairly small per-instance monthly charge. Click through for the details. That said, you will be able to use this feature on-premises and in other clouds, not just in Azure VMs.

Comments closed

Breaking Changes in Azure Data Explorer

Gabi Lehner announces a change:

The current_principal_is_member_of() function checks if the principal who runs the query is a member in any of the users, apps or groups provided as arguments.

Up until now, it was allowed to specify the AAD group details in multiple forms, including the display name of the AAD group, without specifying the tenant id or name, for example current_principal_is_member_of(“mygroup”).

I have to say, that’s a pretty big security flaw.

Comments closed

Creating a Snowflake Instance

Arun Sirpal sets up Snowflake:

Now let’s start the process of creating a snowflake account in the Azure Cloud. You can sign up for a free trial from here – https://signup.snowflake.com/ I am going to bypass this and go straight to the setup screens. (This is slightly different because as an org-admin I have the power to create accounts)

Select the cloud provider and edition you require; we have already discussed these options before. You know me, its going to be Azure but feel free to dive into AWS or GCP.

Read on for some step-by-step installation instructions.

Comments closed

Cross-Subscription Key Vault Access

Andrew Coughlin sets up secure Key Vault access:

Let’s first discuss the setup of what we will be discussing in this blog post.  I will have two subscriptions assigned to the same Azure AD Tenant.   Within each Azure subscription I will have a resource group in each.  I will create the Azure Key Vault in one subscription / resource group and then I will create a virtual machine in the other subscription / resource group.  This is just for example purposes; I could utilize other azure services that can use managed identities.   I could also create a service principal for my application to use to get keys or secrets.

In this example we would be using private endpoints.  Are you looking for how to do this with public endpoints?  Check out my recent post on how to do that here .

When in doubt, private endpoints are the right choice. They’re probably the right choice when not in doubt as well.

Comments closed

Starting a Data Mesh Project

Paul Andrew continues a series on data mesh:

A common question I get asked a lot when creating a data mesh architecture is where to start? The consultant in me defaults the answer to ‘it depends’, of course 

However, in this blog post I want to give a better answer based on my experience of working with various customers so far. As always, the usual caveats apply, I’m happy to go first when trying to define a starting point for our data mesh delivery and fully accept that parts of this are probably wrong. This is also founded in the knowledge that every customer I’ve worked with is different, with different priorities and very subjective views on why they even need a data mesh architecture. Not to mention various levels of data platform maturity.

Paul also includes some nice roadmap and architectural box-drawing diagrams, so check those out.

Comments closed

Azure VM Auto-Shutdown

Dennes Torres saves some cash:

The Auto-Shutdown policy is another important policy to ensure our virtual machines don’t expend more than what we planned for them. If we have a time window to use the virtual machines, the auto-shutdown policy can deactivate them at the right time.

We need to discover the deep internal details about the auto-shutdown configuration before creating the policy. The method we can use is to set this configuration and export the virtual machine as a template. We change the configuration to on and off, export and check the difference.

This can be kind of annoying when you’re working late—though you can delay auto-shutdown pretty easily. If you’re the type of person to forget turning off cloud resources when not in use, this is one way to prevent an unexpectedly large bill.

Comments closed

Reviewing Oracle Database Service on Azure

Kellyn Pot’vin-Gorman has a tough talk:

If we were to ask any DBA to separate the database in one cloud and the application tier in another without the context of a marketing announcement, they would look at us like we’d grown a third head. I’m incredibly surprised that anyone even considers the OCI Interconnect for this use, let alone the 150 that are currently using it.  Oracle applications, like E-business Suite, Peoplesoft, JD Edwards and Hyperion are incredibly network latency sensitive and to recommend separating their tiers in two separate clouds just is alien to me.  When we deploy these in Azure, we place all tiers in a proximity placement group to let Azure know that they are connected and this ensures that when a resource comes online after changes are made, redeployments, etc. the resources stay close to each other.

Definitely worth a read.

Comments closed

Data Sharing and Secure Cleanrooms in Databricks

Craig Porteous reviews a couple of announcements from Data + AI Summit:

Having worked with many organisations across different industries and sectors, the sharing of data with partners and vendors is always a pain point and one that all too often results in both parties not quite getting what they want or need. This isn’t restricted to my experience however which is why Databricks announced Delta Sharing back at DATA + AI Summit 2021.

Coming to this year’s conference, Delta Sharing has been established as the foundation for many new features with the announcement Databricks Marketplace and Cleanrooms for example, both built upon the Delta Sharing protocol. We’ll explore Cleanrooms below and I’ll look at the Databricks Marketplace in it’s own post.

Read on for Craig’s thoughts on two of the bigger announcements at this year’s summit.

Comments closed

The Basics of Snowflake Architecture

Arun Sirpal lays out the foundation of Snowflake DB’s architecture:

At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.

Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.

Read on to learn about the other two layers and how they meet.

Comments closed