February 2020 – Page 3

The remainder of this post discusses how to implement streaming ETL architectures with Apache Flink and Kinesis Data Analytics. The architecture persists streaming data from one or multiple sources to different destinations and is extensible to your needs. This post does not cover additional filtering, enrichment, and aggregation transformations, although that is a natural extension for practical applications.
This post shows how to build, deploy, and operate the Flink application with Kinesis Data Analytics, without further focusing on these operational aspects. It is only relevant to know that you can create a Kinesis Data Analytics application by uploading the compiled Flink application jar file to Amazon S3 and specifying some additional configuration options with the service. You can then execute the Kinesis Data Analytics application in a fully managed environment. For more information, see Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications and the Amazon Kinesis Data Analytics developer guide.

Click through for the walkthrough.

Comments closed

Using Sqoop to Move Data into Hive

Published 2020-02-25 by Kevin Feasel

Jon Morisi continues a series on Sqoop:

Sqoop completes the import task by running MapReduce jobs importing the data to HDFS, and then running Hive commands (CREATE TABLE / LOAD DATA INPATH) to move the data to Hive. The default HDFS location is: /user/[login]/[TABLENAME]. If you have any issues during the import you may need to remove the HDFS directory prior to re-running, or else you will get an Error:

Read on for sample calls and additional notes.

Comments closed

Receiving Notifications when Azure Function Apps Fail

Published 2020-02-25 by Kevin Feasel

Gilbert Quevauvilliers shares how to receive notification e-mails when an Azure Function App fails:

Below are the steps to enable error notifications on Azure Function Apps
Follows on from my previous blog post How you can store All your Power BI Audit Logs easily and indefinitely in Azure, where every day it extracts the Audit logs into Azure Blob storage. One of the key things when working with any job that runs, is that I want to know when the job fails. If I do not have this and I assume that the data is always where, I could fall into a situation where there is missing data that I cannot get back.
Below explains how to create an alert with a notification email if an Azure Function App fails.

Read on for the step-by-step instructions.

Comments closed

Power BI and Tabular Model Relationship Types

Published 2020-02-25 by Kevin Feasel

Marco Russo takes us through the different types of relationships we might encounter in Power BI and Analysis Services Tabular models:

A relationship can be strong or weak. In a strong relationship the engine knows that the one-side of the relationship contains unique values. If the engine cannot check that the one-side of the relationship contains unique values for the key, then the relationship is weak. A relationship can be weak either because the engine cannot ensure the uniqueness of the constraint, due to technical reasons we outline later, or because the developer defined it as such.
A weak relationship is not used as part of table expansion. Power BI has been allowing composite models since 2018; In a composite model, it is possible to create tables in a model containing data in both Import mode (a copy of data from the data source is preloaded and cached in memory using the VertiPaq engine) and in DirectQuery mode (the data source is only accessed at query time).

There is quite a bit of useful information in here.

Comments closed

Understanding Azure SQL Database Elastic Jobs

Published 2020-02-25 by Kevin Feasel

Kate Smith takes us through some important concepts around Elastic Jobs in Azure SQL Database:

It is very important that the T-SQL scripts being executed by Elastic Jobs be idempotent. This means that if they are run multiple times (by accident or intentionally) they won’t fail and won’t produce unintended results. If an elastic job has some side effects, and gets run more than once, it could fail or cause other unintended consequences (like consuming double the resources needed for a large statistics update). One way to ensure idempotence is to make sure that you check if something already exists before trying to create it.

This takes some getting used to, but once you’re in the habit, you are much better off. Read on for more details on other key concepts.

Comments closed

Monitoring Availability Groups

Published 2020-02-25 by Kevin Feasel

Nisarg Upadhyay gives us some of the low-down on monitoring availability groups:

In my previous articles, I have explained the step-by-step process of deploying an AlwaysOn Availability group on SQL Server 2017. In this article, I am going to explain how to monitor AlwaysOn availability groups.
First, let’s review the configuration of the availability group we had deployed previously. To do that, open SQL Server Management Studio  Expand database engine from the object explorer  Expand “AlwaysOn High Availability”  Expand “Availability Groups.” You can see the availability group named SQLAAG. Under this availability group (SQLAAG), you can see the list of availability replicas, availability databases, and availability group listeners.

Click through for some tooling built into SQL Server Management Studio, as well as relevant Perfmon counters.

Comments closed

Entering Data into Power Query from Excel

Published 2020-02-25 by Kevin Feasel

Ed Hansberry shows a quick way to hand-enter some data into Power Query from Excel:

One of the cool things about Power BI is you have a nice “Enter Data” button on the home ribbon where you can quickly add data to your model. The table isn’t dynamic, but sometimes you just need a small table to finish your model to add descriptions or some other bit of data without creating another table in a source that needs to be connected to. Enter Data is perfect for this.

It did take a little bit of trickery to accomplish, but it’s pretty easy to do.

Comments closed

Reading CPU Measures from the Ring Buffer

Published 2020-02-25 by Kevin Feasel

Taiob Ali explains what the CPU and memory measures are from the scheduler monitor ring buffer:

Here is a sample output of XML from sys.dm_os_ring_buffers where WHERE ring_buffer_type = N’RING_BUFFER_SCHEDULER_MONITOR’. What do those XML elements mean? In order to monitor CPU usages, you need to understand what each element means so you can use the values. I will explain each one in this blog post.

Read on for the list and what each means.

Comments closed

Finding YARN Cluster Idle Time

Published 2020-02-24 by Kevin Feasel

Dmitry Tolpeko has a Python script to track YARN cluster idle time:

In the previous article Calculating Utilization of Cluster using Resource Manager Logs I showed how to estimate per-second utilization for a Hadoop cluster.
This information can be useful to calculate the idle time statistics for a cluster i.e. time when no any containers are running.

Click through for the script.

Comments closed

20 Years of R

Published 2020-02-24 by Kevin Feasel

Jozef Hajnala has some fun looking at the growth in R over the past 20 years:

It is almost the 29th of February 2020! A day that is very interesting for R, because it marks 20 years from the release of R v1.0.0, the first official public release of the R programming language.

Click through to see how much faster R has become, as well as the ecosystem changes during that time. H/T R-Bloggers

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29

Month: February 2020

Streaming Pipelines in AWS with Flink and Kinesis Data Analytics

Using Sqoop to Move Data into Hive

Receiving Notifications when Azure Function Apps Fail

Power BI and Tabular Model Relationship Types

Understanding Azure SQL Database Elastic Jobs

Monitoring Availability Groups

Entering Data into Power Query from Excel

Reading CPU Measures from the Ring Buffer

Finding YARN Cluster Idle Time

20 Years of R