Press "Enter" to skip to content

Category: Cloud

A Primer on Azure Arc-Enabled Data Services

Warwick Rudd has a four-parter on Azure Arc-Enabled Data Services. Part 1 sets the stage:

Utilising Azure Arc-enabled data services provides you the ability to take advantage of the Azure data services (SQL Server, Azure SQL Managed Instance, PostgreSQL) in a hybrid environment. This offering provides you with reduced administrative efforts in managing and maintaining your data services while giving you the same look and feel as if you were running in the Azure Cloud.

Part 2 looks at the Data Controller:

The Azure Arc Data Controller is a Kubernetes operator that performs all of the orchestration to ensure you achieve your desired state. This is the main component in the Azure Arc infrastructure that links the data services with the Arc-enabled hardware located either in your On-premises, Azure, or any other public cloud data center and your azure subscription.

The Arc data controller allows you to deploy, manage, secure, and monitor your deployed data services estate using Azure Data Studio or the Azure Portal (only for directly connected mode deployments) but giving you the same experience as if you were managing your data services from inside of the Azure Portal.

Part 3 deploys a Data Controller:

As previously mentioned there are 2 types of deployment available for your Arc Data Controller. In this post, we are going to have a look at deploying in the Arc Data Controller using the directly connected mode.

For a directly connected Arc Data Controller, we have direct connectivity to our Azure subscription. With this in mind, there are several options as we previously discussed on how to deploy the data controller. For this post, we are using the portal deployment method.

Finally, Part 4 covers management options:

With ADS open and running you can create connections to Arc Data Controllers the same as you can with Instances of SQL Server. In ADS we have under the connections area a section specific for Arc Data Controllers.

Check out all four posts.

Comments closed

Migrating Databases between SQL Managed Instances

Etienne Lopes performs a migration:

In this post I’m going to show a very simple way to migrate a database between two SQL Server managed Instances in Azure. I’m not a big fan of bacpac files (although I work with it when necessary) so I’ll use a different approach here. Besides, when creating a bacpac file using SSMS there are some schema validations that occur at the beginning that will abort the bacpac generation for example if the database holds three-part names inside stored procedures. While not supported in SQL Azure DB it is supported in SQL Managed Instances (as are cross-database queries), and it can be quite frustrating to experience this show stopper when using bacpac’s to migrate or copy databases between Managed Instances.

Click through for the demo. And yeah, I’ve run into limiting factors with bacpacs, such as having certificates for encrypting data (even if you back those up separately).

Comments closed

Adding an Existing Data Factory to GitHub

Andy Leonard has a three-parter for us. Part 1 shows you how to create a GitHub account and repo:

The unabridged topic of source control with github is beyond the scope of this post. There are a number of ways to accomplish the tasks described in this post and series. I welcome your suggestions in the comments.

This post is written to help Azure Data Factory developers get started using github.

Part 2 connects a Data Factory to the repository:

For the purposes of this demo, accept the defaults for “Publish branch” and “Root folder.” Check the “Import existing resources to repository” checkbox under the “Import existing resource” property, select the main branch in the “Import resource into this branch” property, and then click the “Apply” button:

Part 3 handles changes:

Applying what we’ve configured and learned thus far, let’s put this to work in a code-management workflow.

When it’s time to make a change, first create a new branch. I can hear some of you thinking, “Why, Andy? Why create a new branch?” That’s an excellent question. I am so glad you asked! Think of the new branch as a temporary copy of the current state of my Azure Data Factory. 

This series works from the assumption that you don’t have any real experience with Git (or GitHub) for source control, and maybe not much source control experience at all.

Comments closed

Building a Data Serving API in Azure

Justice Zishanhi has some recommendations for serving data in Azure:

Data is an important asset to all organizations big and small. As these organizations mature, building an end-to-end data platform to enable BI and AI at scale has become part of that journey. Some organizations, have the requirement to expose modelled data in a data warehouse or data lake (Azure Data Lake Storage Gen2) to downstream consumer applications (mobile or web apps) where access patterns can be unpredictable in respect to frequency of access and/or type of data that is requested.

Data warehouse engines and data lakes are not designed for singleton transactional (request / response) interactions.  To serve these requests at scale and to meet the different SLAs and access pattern unpredictability, data needs to be offloaded to a suitable database engine (i.e., a caching layer) that is built to serve such queries.  

The “Design Patterns” section of this article highlights a generalized pattern for implementing a data serving API which meets this requirement – consisting of a Data Platform component and an API component. For implementing the API, two patterns are commonly adopted – a synchronous pattern or an asynchronous pattern. Both are explored in the “API Implementation Patterns” section of this article.

The example focuses on Cosmos DB and provides quite a bit of helpful guidance.

Comments closed

An Intro to Key Word Analysis

Lewis Prince continues a series on natural language processing:

Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA).

Read on for a high-level overview of the topic and how to do it in Cognitive Services. But not the topic model—that’d be a different post.

Comments closed

Resolving tempdb Issues in Azure SQL DB

Holger Linke troubleshoots some problems:

The tempdb system database is a global resource available to users who are connected to Azure SQL Database or any instance of SQL Server. It holds temporary user objects that are explicitly created by a user or application, and internal objects that are created by the SQL Server database engine itself. The most common tempdb issue is running out of space, either regarding tempdb’s overall size quota or the transaction log.

The available tempdb space in Azure SQL Database depends on two factors: the service tier (pricing tier) that the database is configured with, and the type of workload that is executed against the database. These are also the main factors to control if you are running out of tempdb space.

Click through for several error cases and how we can resolve them.

Comments closed

Azure Functions and Azure Database Options

Sarah Dutkiewicz continues a series on learning Azure. First up, Azure Functions:

Azure Functions are not something you’ll see rendered on a front-end somewhere. They’re a serverless solution used for doing things in the back-end and the middle tier. 

After that, Sarah touches on database options:

There are many databases on Azure – including relational data in Azure SQL, NoSQL with Azure Cosmos DB, and even some popular databases in the open source realm such as MySQL and PostgreSQL. These are just a few of the data stores available. Check this page of Azure Databases for a matrix of the databases available compared by their features.

Click through for quite a few links and information on when to use what.

Comments closed

Serverless Compute for Databricks SQL

Nikhil Jethava and Shankar Sivadasan make an announcement:

We are excited to announce the preview of Serverless compute for Databricks SQL (DBSQL) on Azure Databricks. DBSQL Serverless makes it easy to get started with data warehousing on the lakehouse. Serverless compute for DBSQL helps address challenges customers face with cluster startup time, capacity management, and infrastructure costs:

Click through for more details and a short video. Azure Synapse Analytics and Databricks are definitely going head-to-head in the modern data warehousing space and I’m fine with that—hopefully it makes both products better as a result.

Comments closed

Updates to AzureDevOps-AzureSQLDatabase Repo

Kevin Chant updates a repo:

In this post I want to cover some significant updates to an Azure SQL Database repository that I have been doing for one of the public GitHub repositories that I share.

Due to the fact that I have updated the AzureDevOps-AzureSQLDatabase repository. Which contains an example of a SQL Server database project that you can use to perform CI/CD on an Azure SQL Database using Azure DevOps.

It does this by using the popular state-based migration method of creating a dacpac file based on the contents of a database project. From there, the dacpac file can be used to update one or more databases.

Click through for those updates.

Comments closed

Preventing Data Exfiltration form Managed Instances

Niko Neugebauer wants to hang on to that data:

Data exfiltration is a technique that is also sometimes described as data theft or data extrusion, that describes the unauthorized extraction of data from the original source. This unauthorized extraction can be executed either manually or automatically by the malicious attacker.

As part of your Network Infrastructure, you might have tightened your security to make sure you have all the bells and whistles to lock down your Azure SQL Managed Instance to be accessed only by your application and not exposed to the Internet or any other traffic. However, this doesn’t stop a malicious admin from taking a backup or creating a linked server to another resource outside your enterprise subscription for extracting the data. This action would be data exfiltration. In a typical on-premises infrastructure, you can lock down network access completely to make sure that the data never leaves your network. However, in a cloud setup, there is a possibility that someone with elevated privileges can export data or perform some other malicious activity targeting their own resources outside your organization, compromising your enterprise data. Hence, it is very important to understand the different data exfiltration scenarios and make sure that you are taking the right steps to monitor for and prevent such activities.

Click through for a table which shows common exfiltration scenarios and things you can do to reduce the risk of exfiltration. With access, though, there’s always going to be a risk of exfiltration: even in a SCIF, you can get away with shoving records into your pants if you’re famous enough.

Comments closed