Cloud – Page 125 – Curated SQL

Offset and Limit with Cosmos DB

Published 2019-10-30 by Kevin Feasel

Hasan Savran takes us through the OFFSET and LIMIT clauses in Cosmos DB:

OFFSET LIMIT clause one of the latest additions to the Azure Cosmos DB. Skip/Take function was a big request from users and Cosmos DB team listened users and deliver this functionality. If you think Cosmos DB is missing a feature and if you have a new idea, you can use Feedback Forums to give feedback to Cosmos Db team.

OFFSET LIMIT clause let you skip x number of results then take y numbers of values from the query. Count for OFFSET and Limit are integer and both are required. In other words, You must use LIMIT if you use OFFSET.

A common use for this is paging. I’d be interested to see if this shares the issues that the SQL Server version has: you may only return back 20 rows, but you’re potentially scanning N + 20 each time.

Comments closed

Incremental Data Migration to Blob Storage

Published 2019-10-29 by Kevin Feasel

Ginger Daniel has started a series on data migration into Azure Blob Storage:

Part 1 of this article demonstrates how to upload multiple tables from an on-premise SQL Server to an Azure Blob Storage account as csv files. I covered these basic steps to get data from one place to the other using Azure Data Factory, however there are many other alternative ways to accomplish this, and many details in these steps that were not covered. For a deep-dive into the details you can start here https://docs.microsoft.com/en-us/azure/data-factory/introduction, and https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal#create-a-pipeline.

Part 1 was chock full of information, and it looks like Part 2 will be as well.

Comments closed

Azure AD Credential Passthrough and Databricks

Published 2019-10-28 by Kevin Feasel

Anna Shrestinian, et al, explain how Azure Databricks enables Azure Active Directory credential passthrough when working with Azure Data Lake Storage Gen2:

Azure Data Lake Storage (ADLS) Gen2, which became generally available earlier this year, is quickly becoming the standard for data storage in Azure for analytics consumption. ADLS Gen2 enables a hierarchical file system that extends Azure Blob Storage capabilities and provides enhanced manageability, security and performance.
The hierarchical file system provides granular access control to ADLS Gen2. Role-based access control (RBAC) could be used to grant role assignments to top-level resources and POSIX compliant access control lists (ACLs) allow for finer grain permissions at the folder and file level. These features allow users to securely access their data within Azure Databricks using the Azure Blob File System (ABFS) driver, which is built into the Databricks Runtime.

There are some tradeoffs involved, particularly around using High Concurrency clusters (or limiting yourself to one user account), but it’s a nice bit of added value when you’re a heavy Azure user.

Comments closed

Event-Driven Architecture with Azure Data Factory

Published 2019-10-28 by Kevin Feasel

Rayis Imayev walks us through two scenarios using file event triggers in Azure Data Factory:

Before we get into more details on how to use this new “Ignore empty blobs” feature, let’s briefly review possible scenarios of using file event triggers in your data processing workflow.

Read on for a quick overview of the two approaches.

Comments closed

Ordering in Cosmos DB Queries

Published 2019-10-25 by Kevin Feasel

Hasan Savran shows how you can order data in Cosmos DB queries:

If you need to use multiple properties in your ORDER BY then you need to define COMPOSITE INDEXES.For example when I try to run the following query and try to order the objects by CreatedOn and Score, I end up with an error because I do not have a COMPOSITE INDEX to use with this ORDER BY.

Many parts of Cosmos DB’s SQL syntax are similar to T-SQL, but some of the underlying assumptions—such as, what you need to order data—are quite different.

Comments closed

Deploying a Container Instance in Azure

Published 2019-10-24 by Kevin Feasel

Anibal Kolker takes us through container deployment in Azure:

As derived from the title, the objective of this post is to help you deploy a container instance inside Azure.
However, we’ll extend the typical scenario and make a slightly more extensive use of networking capabilities, by placing the container group inside a private subnet.
Note: For this example, and for simplicity only, we’ll use NGINX as our container of choice. Of course, you’re welcome to try with any other image.

There are a few pieces in play, but Anibal does a good job putting it all together.

Comments closed

Azure DevOps and Data Factory

Published 2019-10-23 by Kevin Feasel

Helge Rege Gardsvoll has a three-part series for us on using Azure DevOps to deploy Data Factories. Part 1 is all about environment setup:

Shared Data Factory
The shared Data Factory is there for one use; self-hosted integration runtimes. This is the component you will use to connect to on-premise or other sources that have restrictions on access such as IP restriction or other firewall rules. Migrating a Self Hosted Integration Runtime is not supported, but you can share the same Integration Runtime across different Data Factories. You can find a description for how to do this in this article.

Part 2 covers Git branching, linked services, and development:

Create datasets and pipeline
For this demo I create two datasets; one for source and one for target, and a simple pipeline that copies the data. Datasets have name that point to the data lake, like ADLS_datahelgeadls2_Brreg_MainUnits, but does not include environment information.

Part 3 covers the release process:

The release process will have these steps;
1. Stop any active triggers. We do not want any pipelines to start as we are changing things (and you should wait until running pipelines finish before publishing)
2. Release from development to target environment
3. Clean up target environment by removing objects that are not present in dev. Also start triggers

This is a great series of posts and also includes a bonus tidbit if you’re using Databricks.

Comments closed

PowerApps Security

Published 2019-10-22 by Kevin Feasel

Jason Bonello gives us some tips on PowerApps security:

Depending on how the backend is set up, the tables having these sensitive data might be in the same database. For example, ERP solutions can have Company Accounts data, Customer related data and Inventory related data all in the same database, maybe under different schemas – but still part of the same database.
Now let’s say we are about to create a PowerApps solution to maintain Customer information. However, as part of the organization policy, this information should not be shared across other departments apart from the intended users.

Read on for some ideas of how to limit the risk of data exposure.

Comments closed

Azure Data Factory Switch Activity

Published 2019-10-22 by Kevin Feasel

Rayis Imayev explains what the Switch activity does in Azure Data Factory:

Developing conditional logic of your Azure Data Factory control flow has been simplified with introducing of the Switch activity – https://docs.microsoft.com/en-us/azure/data-factory/control-flow-switch-activity. Official documentation resource states, this new data factory activity “provides the same functionality that a switch statement provides in programming languages“. I would also add a more simplified definition of the Switch activity in Azure Data Factory: it is a container (or wrapper) for multiple IF conditions.

Click through for an example.

Comments closed

Running Big Data Clusters on VS Subscriptions

Published 2019-10-21 by Kevin Feasel

Kevin Chant has a few tips for people wanting to try out Big Data Clusters with their Visual Studio subscriptions to Azure:

In order to present the right results for various outcomes I attempted to deploy Big Data Clusters multiple times.
When I say multiple times, I mean the number of deployments easily went into double figures. Because I was testing deploying various virtual machine sizes in multiple regions.
Hence, I spent many hours testing and verifying the results in order to present them properly.

Read on to see Kevin’s notes and recommendations.

Comments closed

Category: Cloud