Press "Enter" to skip to content

Category: Cloud

Git Native Support for Databricks Workflows

Vaibhav Sethi and Roland Faeustlin make an announcement:

We are happy to announce native support for Git in Databricks Workflows, which enables our customers to build reliable production data and ML workflows using modern software engineering best practices. Customers can now use a remote Git reference as the source for tasks that make up a Databricks Workflow, for example, a notebook from the main branch of a repository on GitHub can be used in a notebook task. By using Git as the source of truth, customers eliminate the risk of accidental edits to production code. They also remove the overhead of maintaining a production copy of the code in Databricks and keeping it updated, and improve reproducibility as each job run is tied to a commit hash. Git support for Workflows is available in Public Preview and works with a wide range of Databricks supported Git providers including GitHub, Gitlab, Bitbucket, Azure Devops and AWS CodeCommit.

Read on to see how it works.

Comments closed

Delta Live Tables and Power BI Data Modeling

Tahir Fayyaz goes from Delta Lake to Power BI:

To get the optimal performance from Power BI it is recommended to use a star schema data model and to make use of user-defined aggregated tables. However, as you build out your facts, dimensions, and aggregation tables and views in Delta Lake, ready to be used by the Power BI data model, it can become complicated to manage all the pipelines, dependencies, and data quality as you need to consider the following:

– How to easily develop and manage the data model’s transformation code.

– How to run and scale data pipelines for the model as data volumes grow.

– How to keep all the Delta Lake tables updated as new data arrives.

– How to view the lineage for all tables as the model gets more complex.

– How to actively stop data quality issues that result in incorrect reports.

Read on for recommendations, a couple architectural diagrams, and some sample code.

Comments closed

Refreshing SQL Managed Instances which Use TDE

Bradley Ball keeps the dev environment up to date:

Hello Dear Reader!  I was working with some friends lately and we needed to set up a process to refresh their Development Environment databases from Production.  Additionally, the databases are encrypted using Transparent Data Encryption, The SQL MI instances are in different regions, and the SQL MI Instances are in different subscriptions.  To duplicate the environment, in order to match our friends, we did the following setup.

Click through for a high-level overview, step-by-step guidance, and a whole lot of detail.

Comments closed

Extracting Data from DAX Measures into CSV

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.

I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI

The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Point-in-Time Recovery with Azure SQL DB and Managed Instances

Ahmed Mahmoud looks at point-in-time recovery and answers some frequently asked questions:

On some occasions, after the failover is initiated, the current Primary DR will start a new backup chain from that point and old backups are available on the current secondary DR. If we want to restore the backups which exists in Secondary it will not allow us to perform, apparently restore cannot be initiated on the Primary as the backup is not available.

Also, sometimes we observe in secondary DR for few databases, PitR restore point is available and for few databases it shows “no restore point available”    

Read on to understand why that happens and what you can do about it.

Comments closed

Guidance on When to Use Azure Data Explorer

Tzvia Gitlin Troyna has a flow chart for us:

Azure Data Explorer is a big data interactive analytics platform that empowers people to make data driven decisions in a highly agile environment. The factors listed below can help assess if Azure Data Explorer is a good fit for the workload at hand. These are the key questions to ask yourself.

The following flowchart table summarize the key questions to ask when you’re considering using Azure Data Explorer.

In addition to the flow chart, there is a table of three common patterns of interaction which ADE can do well.

Comments closed

Working with Azure VM Scale Sets

Arun Sirpal explains the benefit behind scale sets in Azure:

I really like scale sets. It lets you create and manage up to 1000 load balanced VMs per availability zone using windows or Linux images. (We can have flexible or uniforms modes for orchestration which dictates if you go down the homogenous VM route or a mix, where a mix is the flexible option.

There are many other benefits too apart from scaling, such as built-in load balancing options, increased resiliency via 3 Availability Zones and from a cost perspective you can couple scale sets with Azure Hybrid benefit or even use reserved instances – cost is important in the cloud!

Read the whole thing.

Comments closed

Reconciling Tag Names across Azure

Anthony Watherston has an interesting script:

During a recent cost optimization workshop with a customer, they mentioned that although they had some tagging policies in place there was no consistency of tag names across the Azure environment. This post introduces a script to remediate this and remove some confusion from your tagging strategy.

The customer was trying to ensure that all resources were being tagged with a cost centre tag. Some of this was automatic and some was done manually by people. While there was a policy in place to control this in the future, they needed a way to remediate the existing resources.

This is really useful if you have enough information to create a to-and-from mapping. It won’t automatically understand anything, so you’ll need to do the digging but it will do the renaming.

Comments closed

Making the Case for Azure Analysis Services

Teo Lachev makes the case:

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

Read on for the reasons. Knowing how much it does cost, it almost feels like trying to thread a needle: if you don’t spent enough or have enough data, Power BI is typically much more efficient; if you have sufficient data, I’d want to do a proper cost analysis between on-prem (or IaaS) Analysis Services and Azure Analysis Services.

Comments closed

Cancelling a Cosmos DB Query

Hasan Savran pushes the big red button:

Sometimes you go to a website and the page does not want to open for whatever reason. You might just click Stop and move on to another page. The stop button does not really stop the request, the server still tries to complete the request and send a response to the client even client does not exist anymore. Rather than clicking on Stop maybe you click Refresh which triggers another request when the first one is not completed yet.

      This scenario applies to all database calls. It might take longer to run a query for a reason and simply you need to wait until you get a response from the database server. If it takes too much time, you might want to cancel the request and try to look at your query to make it faster. You can do that programmatically by using CancellationTokens.

Read on for more information about cancellation tokens and how you can use them.

Comments closed