Press "Enter" to skip to content

Category: Cloud

Accessing Data in Azure Data Lake Storage Gen 2

James Serra gives us several methods to access data in Azure Data Lake Storage Gen 2:

With data lakes becoming popular, and Azure Data Lake Store (ADLS) Gen2 being used for many of them, a common question I am asked about is “How can I access data in ADLS Gen2 instead of a copy of the data in another product (i.e. Azure SQL Data Warehouse)?”. The benefits of accessing ADLS Gen2 directly is less ETL, less cost, to see if the data in the data lake has value before making it part of ETL, for a one-time report, for a data scientist who wants to use the data to train a model, or for using a compute solution that points to ADLS Gen2 to clean your data. While these are all valid reasons, you still want to have a relational database (see Is the traditional data warehouse dead?). The trade-off in accessing data directly in ADLS Gen2 is slower performance, limited concurrency, limited data security (no row-level, column-level, dynamic data masking, etc) and the difficulty in accessing it compared to accessing a relational database.

Since ADLS Gen2 is just storage, you need other technologies to copy data to it or to read data in it.

Read on for the solution.

Comments closed

Disambiguating Azure SQL Database Classes

Arun Sirpal explains the different types of Azure SQL Database available to us:

I want to do a quick summary post of the many different types of Azure SQL Database available and I am not talking about elastic pools, VMs etc, more so the singleton type.

Azure SQL Database (I call normal mode) – A choice between the DTU model (Basic, Standard and Premium) and vCore (General Purpose and Business Critical). Within this space there are two different architecture types used by Microsoft under the covers.

As the product expands, we get more and more options, and Arun clarifies where each fits.

Comments closed

Upgrading Azure Kubernetes Service

Chris Taylor has a point updates to jump in Azure Kubernetes Service:

As it is late at night my brain wasn’t working as it should be but thought I’d put a quick blog out there to say that if you are on v1.11.5 and want to upgrade to >= v1.13.10 then you have to do this in a 2 stage process by upgrading to v1.12.8 first:

Fortunately, upgrading is pretty easy using the Azure command line or even the Azure portal.

Comments closed

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift:

Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“.

When you build transformations that need to handle changing source schemas, your logic becomes tricky. In ADF, you can either build data flows that always look for patterns in the source and utilize generic transformation functions, or you can add a Derived Column that defines your flow’s canonical model.

Click through for the discussion and comparison. Schema drift has been the bane of Integration Services’s existence, so it’s good to see them tackling the idea in Azure Data Factory.

Comments closed

Hot Patching Azure SQL Database

Hans Olav Norheim has an interesting paper on a technique Microsoft uses to release SQL Server patches for Azure SQL Database while minimizing downtime:

The SQL Engine we are running in Azure SQL Database is the very latest version of the same engine customers run on their own servers, except we manage and update it. To update SQL Server or the underlying infrastructure (i.e. Service Fabric or the operating system), we must stop the SQL Server process. If that process hosts the primary database replica, we move the replica to another machine (requiring a failover).
 
During failover, the database may be offline for a second and still meet our 99.995% SLA. However,  failover of the primary replica impacts workload because it aborts in-flight queries and transactions. We built features such as resumable index (re)build and accelerated database recovery to address these situations, but not all running operations are automatically resumable. It may be expensive to restart complex queries or transactions that were aborted due to an upgrade. So even though failovers are quick, we want to avoid them.

Read on to see how they do it. There’s no on-prem analogue yet, though perhaps that will come in time.

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Troubleshooting AWS Database Migration Service Errors

Samir Behara takes us through troubleshooting AWS Database Migration Service issues:

For troubleshooting any issues with AWS DMS, it is necessary to have logs enabled. The DMS logs would typically give a better picture and helps find errors or warnings that would indicate the root cause of the failure. If the logs are not available there is nothing much you can do from a detailed troubleshooting analysis perspective. So basically next step is to turn on DMS logs and kick the job again and validate if the errors are captured in the logs.

If logs are not enabled, you need to set up a new task with logging enabled so if and when it errors out, you can take a look and troubleshoot the same.

I’ll save my full rant for another day, but I’m not that impressed with DMS. It could be a failing on my part, though.

Comments closed

Azure Backup for SQL Server VM Pains

John Sterrett has run into a few issues with Azure Backup for SQL Server VMs:

Having the ability to backup new databases automatically is taken for granted. So much, that I noticed that Azure Backup for SQL Server VM’s will not automatically backup new databases for you. That’s right. Make sure you remember to go in and detect and select your new database every time you add a database or you will not be able to recover.

Azure Backup for SQL Server VM’s has an interesting feature called Autoprotect. This should automatically backup all your databases for you. Unfortunately, this does not work. Yes, I double-checked by enabling autoprotect for a VM and I added a new database. The database didn’t get backed up so I had to manually add the database.

Some of these seems like it’s easy enough to fix, so hopefully the product gets better over time.

Comments closed