Cloud – Page 102 – Curated SQL

The Raw Facts on Azure SQL DB Serverless

Published 2020-10-21 by Kevin Feasel

Taiob Ali gives us a briefing summary on Azure SQL Database Serverless:

Occasionally, load balancing automatically occurs if the machine cannot satisfy resource demand within a few minutes. For example, if the resource demand is 4 vCores, but only 2 vCores are available, it may take up to a few minutes to load balance before 4 vCores are provided. The database remains online during load balancing except for a brief period at the end of the operation when connections are dropped.

Click through for more points along these lines.

Comments closed

Azure Site-to-Site VPN Blocking Certain Traffic

Published 2020-10-20 by Kevin Feasel

Denny Cherry diagnoses a network configuration issue:

I ran across an interesting a couple of weeks ago when working with a client. The client has several subsidiaries each with their own vNet. The client had a site to site VPN been the Azure vNets. All traffic was successfully crossing the Azure Site to Site VPN as expected. The sticking point was that a software licensing server running in one of the subsidiaries Azure infrastructure configurations. The software licensing software simply wasn’t working.

Click through to learn why.

Comments closed

Optical Character Recognition with Tesseract and Databricks

Published 2020-10-19 by Kevin Feasel

Alex Aleksandrov takes a look at optical character recognition with the Tesseract library:

The topic of Optical Character Recognition (OCR) is not an unexplored field to the Adatis audience. Some Adati like Kalina Ivanova (link1, link2) and Francesco Sbrescia (link3) have already explored this topic from the perspective of Azure Cognitive Services and Azure Data Lake. In my first blog, I would like to explore this topic from a different perspective: using Tesseract and Databricks.

Click through for instructions.

Comments closed

Indexing S3 Data with NiFi and CDP Data Hubs

Published 2020-10-16 by Kevin Feasel

Eva Nahari, et al, walk us through text indexing of S3 data with Solar, NiFi, and Cloudera Data Platform:

Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e. Solr) is most commonly used for batch indexing data residing in cloud storage, or if you want to do heavy transformations of the data as a pre-step before sending it to indexing for easy exploration. NiFi (as depicted in this blog) is used for real time and often voluminous incoming event streams that need to be explorable (e.g. logs, twitter feeds, file appends etc).
Our ambition is not to use any terminal or a single shell command to achieve this. We have a UI tool for every step we need to take.

Click through to see how well they do at that.

Comments closed

Durable Azure Functions and Azure Data Factory

Published 2020-10-16 by Kevin Feasel

Rayis Imayev wants to use Azure Functions with Azure Data Factory:

Ok, here is my problem: I have an Azure Data Factory (ADF) workflow that includes an Azure Function call to perform external operations and returns output result, which in return is used further down my ADF pipeline. My ADF workflow (1) depends on the output result of the Azure Function call; (2) plus a time efficiency of the Azure Function call is another factor to consider, if its time execution hits 230 seconds or more, ADF Azure Function will fail with a time-out error message and my workflow is screwed.

This gave Rayis the impetus to try out durable functions. Read on to see how that worked out.

Comments closed

Querying Multiple Data Sources in Azure Synapse Analytics

Published 2020-10-16 by Kevin Feasel

James Serra walks us through querying Data Lake Storage Gen2, Cosmos DB, and a table created in an Azure Synapse serverless Apache Spark pool:

As I was finishing up a demo script for my presentation at the SQL PASS Virtual Summit on 11/13 (details on my session here), I wanted to blog about part of the demo that shows a feature in the public preview of Synapse that is frankly, very cool. It is the ability to query data as it sits in ADLS Gen2, a Spark table, and Cosmos DB and join the data together with one T-SQL statement using SQL on-demand (also called SQL serverless), hence making it a federated query (also known as data virtualization). The beauty of this is you don’t have to first write ETL to collect all the data into a relational database in order to be able to query it all together, and don’t have to provision a SQL pool, saving costs. Further, you are using T-SQL to query all of those data sources so you are able to use a reporting tool like Power BI to see the results.

Click through to see how.

Comments closed

Spark Infer Schema vs ADF Get Metadata

Published 2020-10-15 by Kevin Feasel

Paul Andrew compares two techniques for retrieving metadata:

For file types that don’t contain there own metadata (CSV, Text etc) we typically have to go and figure out there structure including; attributes and data types before doing any actual transformation work. Often I’ve used the Data Factory Metadata Activity to do this with its structure option. However, while playing around with Azure Synapse Analytics, specifically creating Notebooks in C# to run against the Apache Spark compute pools I’ve discovered in most case the Data Frame infer schema option basically does a better job here.
Now, I’m sure some Spark people will probably read the above and think, well der, obviously Paul! Spark is better than Data Factory. And sure, I accept for this specific situation it certainly is. I’m simply calling that out as it might not be obvious to everyone

Read on for a comparison of the two techniques.

Comments closed

MLOps with Azure Databricks and MLflow

Published 2020-10-14 by Kevin Feasel

Oliver Koernig walks us through some of the basics of MLOps using MLflow and Azure Databricks:

Most organizations today have a defined process to promote code (e.g. Java or Python) from development to QA/Test and production. Many are using Continuous Integration and/or Continuous Delivery (CI/CD) processes and oftentimes are using tools such as Azure DevOps or Jenkins to help with that process. Databricks has provided many resources to detail how the Databricks Unified Analytics Platform can be integrated with these tools (see Azure DevOps Integration, Jenkins Integration). In addition, there is a Databricks Labs project – CI/CD Templates – as well as a related blog post that provides automated templates for GitHub Actions and Azure DevOps, which makes the integration much easier and faster.
When it comes to machine learning, though, most organizations do not have the same kind of disciplined process in place.

Read on for a demonstration of the process.

Comments closed

Metadata-Driven ADF Pipelines for Synapse

Published 2020-10-14 by Kevin Feasel

Hope Foley is back:

So what’s the hard thing I want to help make easier in this post? Metadata driven pipelines in Azure Data Factory! I had the opportunity awhile back to work with a customer who was pulling data out of large SQL Servers to eventually land data into Azure Synapse SQL pools back when they were still Azure SQL DW. We created a couple load pattern pipelines that used metadata in Azure SQL DB to load Synapse sql pool tables from parquet files in Azure Data Lake Storage (ADLS) Gen 2.
Not gonna lie, the pipelines weren’t easy for me to learn to setup initially. Big thanks to Catherine for your blog which was a life preserver in the hardest parts! So I wanted to see if I could automate it in my old friend PowerShell.

It would also be worth looking at some of the work Paul Andrew has done around ADF.procfwk for another approach to the problem.

Comments closed

Copying an Azure SQL Database

Published 2020-10-14 by Kevin Feasel

Garry Bargsley gives us two methods for copying an Azure SQL Database:

Copying an Azure SQL Database is a vital skill when managing cloud databases. Recently, a request was received from the “business”. They wanted to create a copy of an Azure SQL Database that was in a development environment. The database has been certified and early testing was accepted. They now want an exact copy in QA to start integration testing. The process of making an Azure SQL Database copy is straightforward. There are several different ways to perform this action.
Two methods chosen will use the Azure Portal and PowerShell to demonstrate the completion of this request.

Click through for the demos.

Comments closed

Category: Cloud