Cloud – Page 121 – Curated SQL

Storage is a bit more difficult to plan and make considerations for, due to having to consider multiple factors. For storage you need to account for the overall storage requirement for both storage size, and I/O needs. How many GBs or TBs are needed for the SQL Server instance and how fast does the storage need to be? How many IOPS and how much throughput is the on-premises instance using? For that, you must baseline your current workload using perfmon to capture average and max MB/s and/or taking snapshots of sys.dm_io_virtual_file_stats to capture throughput utilization. This will give you an idea of what type of I/O and throughput you need in the new environment. Several customers I’ve worked with have missed this vital part of migration planning and have encountered performance issues due to selecting an instance level that didn’t support their workload.

Tim has a lot of good advice in here, so read the whole thing.

Comments closed

Streaming Pipelines in AWS with Flink and Kinesis Data Analytics

Published 2020-02-25 by Kevin Feasel

Steffen Hasumann shows us how to put together a streaming ETL pipeline in AWS using Apache Flink and Amazon Kinesis Data Analytics:

The remainder of this post discusses how to implement streaming ETL architectures with Apache Flink and Kinesis Data Analytics. The architecture persists streaming data from one or multiple sources to different destinations and is extensible to your needs. This post does not cover additional filtering, enrichment, and aggregation transformations, although that is a natural extension for practical applications.
This post shows how to build, deploy, and operate the Flink application with Kinesis Data Analytics, without further focusing on these operational aspects. It is only relevant to know that you can create a Kinesis Data Analytics application by uploading the compiled Flink application jar file to Amazon S3 and specifying some additional configuration options with the service. You can then execute the Kinesis Data Analytics application in a fully managed environment. For more information, see Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications and the Amazon Kinesis Data Analytics developer guide.

Click through for the walkthrough.

Comments closed

Receiving Notifications when Azure Function Apps Fail

Published 2020-02-25 by Kevin Feasel

Gilbert Quevauvilliers shares how to receive notification e-mails when an Azure Function App fails:

Below are the steps to enable error notifications on Azure Function Apps
Follows on from my previous blog post How you can store All your Power BI Audit Logs easily and indefinitely in Azure, where every day it extracts the Audit logs into Azure Blob storage. One of the key things when working with any job that runs, is that I want to know when the job fails. If I do not have this and I assume that the data is always where, I could fall into a situation where there is missing data that I cannot get back.
Below explains how to create an alert with a notification email if an Azure Function App fails.

Read on for the step-by-step instructions.

Comments closed

Understanding Azure SQL Database Elastic Jobs

Published 2020-02-25 by Kevin Feasel

Kate Smith takes us through some important concepts around Elastic Jobs in Azure SQL Database:

It is very important that the T-SQL scripts being executed by Elastic Jobs be idempotent. This means that if they are run multiple times (by accident or intentionally) they won’t fail and won’t produce unintended results. If an elastic job has some side effects, and gets run more than once, it could fail or cause other unintended consequences (like consuming double the resources needed for a large statistics update). One way to ensure idempotence is to make sure that you check if something already exists before trying to create it.

This takes some getting used to, but once you’re in the habit, you are much better off. Read on for more details on other key concepts.

Comments closed

MR3: Hive on Kubernetes

Published 2020-02-21 by Kevin Feasel

Alex Woodie reports on a DataMonad production:

MR3 is a software product developed by a team led by Sungwoo Park. The software, which is not open source, is sold by a Delaware-based software company called DataMonad. After prototyping a Java-based execution engine called MR2 in the 2013 timeframe, development of Scala-based MR3 began in 2015. The first release of MR3 was delivered in early 2018, and version 1.0 was released yesterday.
According to DataMonad, MR3 is an execution engine for big data processing, and Hive is the first and main application that’s been configured to run on it (Tez is also supported). The company says MR3 offers comparable performance to the latest release of Hive, dubbed LLAP, but without the technical complexity.

The closed-sourcedness is a bit of a downer, but I like having more competition in the space.

Comments closed

Executing Azure Data Factory Pipelines with Azure Functions

Published 2020-02-21 by Kevin Feasel

Paul Andrew wants to execute an Azure Data Factory pipeline via an Azure Function call:

For the function itself, hopefully this is fairly intuitive once you’ve created your DataFactoryManagementClient and authenticated.
The only thing to be careful of is not using the CreateOrUpdateWithHttpMessagesAsync method by mistake. Make sure its Create Run. Sounds really obvious, but when you get code drunk names blur together and the very different method overloads will have you confused for hours!…. According to a friend 🙂

Read the whole thing.

Comments closed

Reading Azure DevOps Results in Powershell

Published 2020-02-20 by Kevin Feasel

Mark Broadbent doesn’t let the lack of an official Powershell module get in the way:

In my post Using Azure CLI to query Azure DevOps I explained how you can use the Azure CLI to query Azure DevOps so you can obtain useful information on builds, releases, and other useful information. The solution required a certain level of skill with JMESPath to manipulate your result sets -which as explained can be a little confusing.
However once you have a bare bones result set, it is likely that you will want to consume these results in a more user-friendly environment such as PowerShell so that you can build upon these data sets. I thought this would be an easy thing to do, but as you will see below it was anything but.

Read on for some thoughts and a sample script.

Comments closed

Storing Power BI Audit Logs in Blob Storage

Published 2020-02-17 by Kevin Feasel

Gilbert Quevauvilliers works around a built-in constraint with Power BI Audit Logs:

With the new Power BI Get-PowerBIActivityEvent I wanted to find a way where I could automate the entire process where it all runs in the cloud.
One of the current challenges with the Audit logs is that they only store 90 days, so if you want to do analysis for longer than 90 days the log files have to be stored somewhere. Why not use Azure Blob Storage?
Whilst these steps might appear to be rather technical if you follow them and you have access to an Azure Subscription you can do this too.

Gilbert warns us up-front that this will be a lengthy post and that is quite true. But if you need to hold those audit logs more than 90 days, this is a great way of doing so.

Comments closed

Connecting to Snowflake with Power BI

Published 2020-02-06 by Kevin Feasel

Gilbert Quevauvilliers shows us how we can connect from a Snowflake DB instance to Power BI using DirectQuery:

The first thing I did was to install the ODBC Drivers.
I installed the 64bit drivers where I had my Power BI Desktop installed, and I also installed it on all the Servers where I had the On-Premise Data gateway installed.
Below is the link that I used which should always be the latest version
https://sfc-repo.snowflakecomputing.com/odbc/win64/latest/index.html
One thing to note is all that I did was I installed the ODBC driver I did not actually do any configuration of the ODBC driver, this is because it will be configured in Power BI Desktop.

Read on for the configuration instructions as well as getting past “it works in Power BI Desktop.”

Comments closed

Quick Hits on Azure Databricks Performance

Published 2020-02-05 by Kevin Feasel

Rayis Imayev has a few thoughts on optimizing delta table-based workloads in Azure Databricks:

2) Enable the Delta cache – spark.databricks.io.cache.enabledtrue
There is a very good resource available on configuring this Spark config setting: https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/delta-cache
And this will be very helpful in your Databricks notebook’s queries when you try to access a similar dataset multiple times. Once you read this dataset for the first time, Spark places it into internal local storage cache and will speed up the process of further referencing it for you.

Click through for several more along these lines.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Category: Cloud

Managing Performance on Azure SQL Managed Instances

Streaming Pipelines in AWS with Flink and Kinesis Data Analytics

Receiving Notifications when Azure Function Apps Fail

Understanding Azure SQL Database Elastic Jobs

MR3: Hive on Kubernetes

Executing Azure Data Factory Pipelines with Azure Functions

Reading Azure DevOps Results in Powershell

Storing Power BI Audit Logs in Blob Storage

Connecting to Snowflake with Power BI

Quick Hits on Azure Databricks Performance