Press "Enter" to skip to content

Category: Cloud

Choosing Compute Options for Databricks

Matt Collins shares some recommendations:

You can use Databricks for a vast range of applications these days. From handling streaming datasets, running Deep Learning models to populating data model fact tables with complex transformations, choosing the correct compute option can seem a lot like a stab in the dark followed by (potentially expensive) trial end error.

You can choose from an incredible range of configurations in the Databricks Compute User Interface.

This variety is comprised of the Virtual Machine (VM) category, size, availability and access mode (also referred to as family).

Determining the right compute choice for you could be answered by the classic answer “it depends”, but this guide aim to inform the decision-making process — both in terms of cost and performance.

Read on for more information.

Comments closed

Azure Data Explorer Web UI Updates for July 2023

Michal Bar has an update for us:

Welcome to the July 2024 update. We are excited to announce new features and improvements in ADX web UI.

Continue reading to learn more about:

  1. Copy query with syntax coloring and KQL IntelliSense improvements
  2. Ad-hoc visual data exploration
  3. Dashboards real time refresh rate
  4. Enhanced data interaction for dashboard tiles
  5. Resize and move dashboard tiles using keyboard only

Click through to continue reading.

Comments closed

Transferring CLR Binaries from SQL Server to Azure SQL Managed Instance

Mihailo Joksimovic shows a use case for Azure SQL Managed Instance link:

Azure SQL Managed Instance link is a new feature enabling you to create a distributed availability group between your SQL Server and Azure SQL Managed Instance. It makes it super simple to connect on-prem. and cloud, providing near real-time replication speeds. Benefits are many and if you aren’t familiar, I’d strongly suggest you read our official blog post about it.

One great benefit for our use-case is that MI Link takes care of transferring the CLR assemblies for you! You can import your assemblies on your SQL Server instance, using all too familiar syntax (i.e. CREATE ASSEMBLY FROM ‘C:\path\to\assembly.dll’), and MI link will ensure that those same assemblies get transferred to cloud. Easy peasy and no need to deal with hex literals anymore.

Read on to learn more about this capability.

Comments closed

A Reference Architecture for Microsoft Fabric

James Serra draws boxes:

Microsoft Fabric uses a data lakehouse architecture, which means it does not use a relational data warehouse (with its relational engine and relational storage) and instead uses only a data lake to store data. Data is stored in Delta lake format so that the data lake acquires relational data warehouse-like features (check out my book that goes into much detail on this, or my video). Here is what a typical architecture looks like when using Fabric (click here for the .vsd):

Click through for the image as well as James’s explanation of the components.

Comments closed

Backing SQL Server up to S3 Buckets

David Fowler backs up a database:

Way back in the mists of time I wrote a post on how to backup SQL server to an S3 bucket using TNTDrive, https://sqlundercover.com/2018/06/18/backup-your-on-premise-sql-server-directly-to-an-aws-s3-bucket/.

Back then, if we wanted to backup SQL to S3 we needed to use a third party tool. Since SQL 2022 things have changed and we’ve now got the option to backup directly to S3 in a similar way that we can backup to Azure BLOB store.

And, going one step further, you can also use PolyBase to read data from S3 buckets in SQL Server 2022.

Comments closed

Cosmos DB HTAP into Azure Synapse Analytics and Microsoft Fabric

Paul Hernandez doesn’t want to write ETL jobs:

In the ever-evolving landscape of data management and analytics, choosing the right tools and approaches is crucial for optimizing performance and achieving business goals. Two prominent solutions that have gained traction are Azure Synapse Link for Azure Cosmos DB and Mirroring in Microsoft Fabric. Both offer unique benefits and cater to different needs, making it essential to understand their differences and use cases.

Read on to see how each of these works, as well as a quick demonstration of efficacy.

Comments closed

Blob Storage Account Lifecycle Maintenance

Andy Brownsword deletes some files but wants to keep other files:

A hierarchy of directories which contain files. That’s how we typically think about file storage. That’s not quite the same everywhere. In Blob Storage a file can appear to be in a directory, but when it’s removed so is the directory.

This can occur when using Lifecycle Management to help purge legacy blobs, which can be unexpected. Let’s look at a way we can help remediate this.

One important thing to remember about Azure blob storage accounts and S3 buckets is that there’s really no concept of a directory structure. It’s all keys, where your key might be dir1/dir2/dir3/file.txt. This is a bit different for Azure Data Lake Storage Gen2 and its notion of hierarchical namespaces (i.e., folders). But Andy does walk through some of the consequences of this and how to work with lifecycle maintenance policies to delete only certain sets of files.

Comments closed

Ad Hoc Data Exploration with Azure Data Explorer

Michal Bar introduces a new feature:

We are excited to introduce the new Data Exploration feature, designed to enhance your ability to delve deeper into the data presented on any Dashboard.

If the information you’re seeking isn’t readily available on the dashboard, this feature allows you to extend your exploration beyond the data displayed in the tiles, potentially uncovering new insights.

Directly from a dashboard, you can refine your exploration using a user-friendly, form-like interface. This intuitive and dynamic experience is tailored for insights explorers seeking insights based on high volumes of data in near real time.

Click through to see the new feature in action.

Comments closed

Tracking Task History in Snowflake

Kevin Wilkie is interested in just one thing:

By the table function name, you’re probably wondering “Sherpa, why on Earth do I really need to look at query history based on the session? I have Query_History_By_User which gives me a broader look at my data. Who really cares?”

Great question, my friend! You’re right. This new Query_History_By_Session table function isn’t going to give you a lot of data that you can’t get easier – and more helpfully – with Query_History_By_User. Why did Snowflake provide me this “useless” function?

Read on for the value of this function and an example of querying task history.

Comments closed

Working with Managed Entities in Azure SQL DB

Josephine Bush creates and uses a managed identity:

Benefits of Using Managed Identities and Entra Groups

  • Enhanced Security: Using managed identities eliminates the need to manage credentials, reducing the risk of credential theft.
  • Simplified Management: Entra Groups streamline the management of permissions for multiple users or managed identities, making it easier to apply consistent access policies.
  • Scalability: As your organization grows, you can easily manage access by adding new users or managed identities to Entra Groups without needing to update database permissions individually.

Read on to see how you can create one and what you can do with it.

Comments closed