Cloud – Page 107 – Curated SQL

Passing an Array of Arrays as a Parameter in Azure Data Factory

Published 2020-09-14 by Kevin Feasel

In my previous blog post – Setting default values for Array parameters/variables in Azure Data Factory, I had helped myself to remember that arrays could be passed as parameters to my Azure Data Factory (ADF) pipelines. This time I’m helping myself to remember that an array of other arrays can also exist as ADF pipeline parameters’ values.

Read on for the example.

Comments closed

Indexing S3 Data with CDP Data Hub

Published 2020-09-11 by Kevin Feasel

Eva Nahari, et al, show how to perform indexing and serving of S3 data in Cloudera Data Platform:

This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. For the curious: DDE is a pre-templeted Solr-optimized cluster deployment option in CDP, and recently released in tech preview. We will only cover AWS and S3 environments in this blog. Azure and ADLS deployment options are also available in tech preview, but will be covered in a future blog post.
We will depict the simplest scenario to make it easy to get started. There are of course more advanced data pipeline setups and more rich schemas possible, but this is a good starting point for a beginner.

Read on for the instructions.

Comments closed

Spark SQL in Delta Lake

Published 2020-09-11 by Kevin Feasel

Kundan Kumarr walks us through some of the basic SQL operations you can perform with Delta Lake in Apache Spark:

Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. On 19th June 2020, Delta lake version 0.7.0 was released and this is the first release on Spark 3.x. This release involves important key features that can make the spark developer’s work easy.
One of the interesting key features in this release is the support for metastore-defined tables and SQL DDLs. So now we can define Delta tables in the Hive metastore and use the table name in all SQL operations. We can perform SQL DDLs to create tables, insert into tables, explicitly alter the schema of the tables, and so on. So in this blog, we will learn how we can perform SQL DDLs/DMLS/DQL in Delta Lake 0.7.0.

Click through for the examples.

Comments closed

Building an Azure Function to Automate CHECKDB

Published 2020-09-11 by Kevin Feasel

Arun Sirpal shows us how to build an Azure Function:

The title is a mouthful and so is this post. In the past I have linked to blog posts from Microsoft that say consistency checks for Azure SQL Database is the responsibility of Microsoft. (https://azure.microsoft.com/en-gb/blog/data-integrity-in-azure-sql-database/)
However, Paul Randal got me thinking about his thoughts on it (via his insider email). Forming the core of this post. If you desire to run DBCC CHECKDB against Azure SQL Database (which I know people do) – how can you do this? There are many ways, but for this blog post – Enter Azure functions. There are many moving parts to this, but once setup and coded it is a very satisfying experience. Let’s dig in. I am NOT going to copy and paste every little element of the high-level guide from Microsoft, there is no point in that but I will show you the links that you need to setup the relevant function app project then the tailored bits around CHECKDB forms the bulk of this post.

This isn’t necessary to do, but if you want to learn how Azure Functions work, it’s a good example of working through the mechanics.

Comments closed

Reading S3 Data into Kafka

Published 2020-09-10 by Kevin Feasel

Ramu Kakarla takes us through using Apache Camel to load Amazon S3 data into a Kafka topic:

Apache Camel is an open-source framework for message-oriented middleware with a rule-based routing and mediation engine that provides a Java object-based implementation of the Enterprise Integration Patterns using an application programming interface to configure routing and mediation rules

Read on to see how this fits together.

Comments closed

ACID Transactions with Hive LLAP in ElasticMapReduce

Published 2020-09-09 by Kevin Feasel

Suthan Phillips and Chao Gao walk us through ACID transactions when using Hive on Amazon’s ElasticMapReduce platform:

ACID (atomicity, consistency, isolation, and durability) properties make sure that the transactions in a database are atomic, consistent, isolated, and reliable.
Amazon EMR 6.1.0 adds support for Hive ACID transactions so it complies with the ACID properties of a database. With this feature, you can run INSERT, UPDATE, DELETE, and MERGE operations in Hive managed tables with data in Amazon Simple Storage Service (Amazon S3). This is a key feature for use cases like streaming ingestion, data restatement, bulk updates using MERGE, and slowly changing dimensions.
This post demonstrates how to enable Hive ACID transactions in Amazon EMR, how to create a Hive transactional table, how it can achieve atomic and isolated operations, and the concepts, best practices, and limitations of using Hive ACID in Amazon EMR.

Click through for a demonstration.

Comments closed

Automating Azure Resource Deletion

Published 2020-09-09 by Kevin Feasel

Drew Skwiers-Koballa has put together a runbook to remove Azure resources tagged in a certain way:

Microsoft puts a lot of effort into making it easy for you to log on to portal.azure.com and spin up a VM, SQL Database, Function, or other instance. The Visual Studio Enterprise subscription with $150 credit per month can go quickly if you’re not careful to stop or remove big ticket items as soon as you’re done with them. Even if you have a virtually unlimited Azure account connected to a credit card, expense account, or a trust fund – you probably don’t want to accidentally leave something running longer than it is needed.
With an Azure Automation account and a Powershell workflow runbook I’m able to use resource tags to set resources for autodeletion by date or immediately that evening.

Click through for that runbook.

Comments closed

Comparing Cassandra and DynamoDB

Published 2020-08-31 by Kevin Feasel

Lewis DiFelice compares and contrasts Cassandra with DynamoDB:

In this post, we’ll look at some of the key differences between Apache Cassandra (hereafter just Cassandra) and DynamoDB.
Both are distributed databases and have similar architecture, and both offer incredible scalability, reliability, and resilience. However, there are also differences, and understanding the differences and cost benefits can help you determine the right solution for your application.

There’s some good info in this comparison.

Comments closed

Changing Azure SQL DB Service-Level Objectives

Published 2020-08-31 by Kevin Feasel

Monica Rathbun notes that SSMS lets you change service-level objectives for Azure SQL Databases:

Sometimes as a DBA, I am lazy and want the ability to execute all of my tasks in one place. Lucky for me I discovered the other day that I can change my Azure SQL Database Service Level Object options within SQL Server Management Studio (SSMS) without ever having to go to the Azure Portal.

Read on to learn how, as well as what you can change.

Comments closed

Working with Read-Only Endpoints in Azure SQL Database

Published 2020-08-27 by Kevin Feasel

Arun Sirpal takes us through one method for improving performance in Azure SQL Database:

One of the main benefits of configuring active geo-replication for Azure SQL Database is leveraging the read-only endpoint, a good technique to split away read only activity from OLTP based workloads. This means that there is no reason why you cannot point users to these databases via tools such as Power BI as highlighted below.

But there are some things to keep in mind, as Arun points out.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Category: Cloud