Press "Enter" to skip to content

Category: Cloud

Using Extended Events with AWS RDS

Grant Fritchey tries out extended events in Amazon’s RDS:

AWS has posted the documentation on what you have to do in order to enable the collection of Extended Events within RDS. Normallly, I’d follow along with the documentation. However, I’m going to approach this like I knew that Extended Events support was there, but I wasn’t aware of the docs. So, I’m starting in SSMS and I’m just going to try plugging in the Extended Events GUI to see what happens. Further, I’m going to use the simplest method for launching Extended Events, XEvent Profiler. 

Read on for Grant’s findings.

Comments closed

Alerting on Azure Budget Thresholds

Daron Yondem makes a budget:

You can’t imagine how many of us forget to set up the proper alerting mechanisms for our cloud subscription consumption. Here is how to do it in Azure in under 2 minutes.

Read on for the answer. I do like Azure’s budgeting tools except for one big thing: you can’t set a cap. Alerting is great but I want to have a “break glass in case of emergency” capability to stop spend altogether if you hit a certain point. I wouldn’t use it in production but for personal or development accounts, that’s big. And you can do it but only when you have a subscription which uses Azure credits—as soon as dollars are involved, there are no caps.

Comments closed

Securing Azure Storage

Craig Porteous continues a series on Azure Data Platform security:

This is the third in a series where I look at all of the resources common to a Data Lakehouse platform architecture and what you need to think about to get it past your security team.

Building upon Azure Databricks, I’ll move from the compute engine to our blob and data lake storage. Things are a little simpler to secure but the plethora of options available can have significant impacts on usability and cost so it’s important to understand the impact before baking them into your design.

Read on for some good advice around securing Azure storage accounts.

Comments closed

High Availability in SQL Managed Instance General Purpose Tier

Niko Neugebauer clears up what options you have for high availability in SQL MI’s General Purpose tier:

The two main requirements around high availability are commonly known as RTO and RPO.


RTO
 – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.


RPO
 – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

S3 and Redshift Data Movement with Role Chaining

Sudipta Mitra, et al, talk AWS security:

This post presents an approach that you can apply at scale to achieve fine-grained access controls to resources in S3 buckets and Amazon Redshift schemas for tenants, including groups of users belonging to the same business unit down to the individual user level. This solution provides tenant isolation and data security. In this approach, we use the bridge model to store data and control access for each tenant at the individual schema level in the same Amazon Redshift database. We utilize ASSUMEROLE and role chaining to provide fine-grained access control when data is being copied and unloaded between Amazon Redshift and Amazon S3, so the data flows within each tenant’s namespace. Role chaining also streamlines the new tenant onboarding process.

Read on for an overview and tutorial.

Comments closed

Persisting Data in Azure Redis Cache

Arun Sirpal feeds the mogwai after midnight:

I mentioned before that you could use the idea of data persistency to rebuild your data from total failure. There are two types. RDB and AOF.

RDB – persists a snapshot of your cache in a binary format. The snapshot is saved in an Azure Storage account. AOF – saves every write operation to a log. The log is saved at least once per second into an Azure Storage account. 

I’m a big proponent of using Redis as a caching service. I’m not a big proponent of using Redis as a persisted database, mostly because I’ve had a lot of bad experiences with persistent Redis…

Comments closed

Change Data Capture in Azure SQL Database

Abhiman Tiwari announces that CDC has gone GA:

CDC is now generally available on Azure SQL databases, enabling customers to track insert / update / delete data changes on their Azure SQL Database tables. On Azure SQL database, CDC offers a similar functionality to SQL Server and Azure SQL Managed Instance, providing a scheduler which automatically runs change capture and cleanup processes on the change tables. These capture and cleanup processes used to be run as SQL Server Agent jobs on SQL Server on premises and on Azure SQL Managed Instance, but now they run automatically through the scheduler in Azure SQL databases. Customers can still run scans and cleanup manually on demand.

Looks like it works pretty much the same as on-premises SQL Server, so it’s got that going for it.

Comments closed

Deploying an Azure Function via Azure DevOps

Koen Verbeeck wants to deploy a Powershell-based Azure Function:

In the blog post Azure Function with PowerShell and the Power BI REST API I explained how you could create an Azure Function using the PowerShell scripting language. This Function connected with the Power BI REST API and retrieved the last refresh status of a dataset. Developing the Function is one thing, deploying it is another. In this blog post I’ll guide you through the set-up of a build and release pipeline in Azure Devops. As a prerequisite, the Azure Function and its dependencies (for example the requirements.psd1 file) are all checked into a Git repo. As a reminder, the folder structure looks like this:

Read on for the walkthrough.

Comments closed

Azure Databricks Security Considerations

Craig Porteous provides some advice on configuring Azure Databricks:

Azure Databricks is an analytics platform and often serves as the central compute component of a data platform, to process ETL/ELT data pipelines and data science workloads. As Databricks is a third-party platform-as-a-service offering securing it works differently to most other first-party services in Azure; for example, we can’t use private endpoints. (More on these in the Azure Storage post)

The two main approaches to working with Databricks in our secure platform are VNet Peering or VNet Injection

Click through to learn the difference between these two, as well as a few other factors to keep in mind as you’re deploying Databricks.

Comments closed