Cloud – Page 47 – Curated SQL

Connecting Kafka Cross-Network

Published 2022-04-11 by Kevin Feasel

Praful Khandelwal sets up a hybrid Kafka cluster:

In this article, we will be talking about a simple set-up involving local machine (macOS) and Azure VM. We’ll discuss the step-by-step procedure to produce events from local machine to Kafka broker hosted on Azure VM and also to consume those events back in local machine. While this does not cover the exact scenario described above, it gives a fair idea about how the Kafka messages can be exchanged across the network.

Kafka is pretty chatty, so I’d hope to have really good network connectivity, such as a Direct Connect (for AWS) or Express Route (Azure) in place.

Comments closed

The Basics of Azure Storage Explorer

Published 2022-04-11 by Kevin Feasel

Manvendra Singh takes us through Azure Storage Explorer:

This article will explain Azure storage explorer, its installations, and details of how to start working with this application to access Azure storage services. Azure storage provides a flexible solution to store various types of data at a massive scale in the cloud environment. If you have many storage accounts in Azure storage, then it will be difficult to manage them. Microsoft has recognized this problem and developed a desktop application Azure storage explorer to manage Azure storage accounts easily. It can be installed on Windows, Linux, and macOS operating systems.

This is a rather useful tool.

Comments closed

Performance Optimization for Azure Data Explorer

Published 2022-04-06 by Kevin Feasel

Ashok Anand Kumar has some performance tips:

Azure Data Explorer provides the capability to easily fetch telemetry data from a variety of data sources and run complex analytical queries. Azure Data Explorer supports both batch and streaming ingestion to support near real-time latency requirements. Batch ingestion will have latencies based on the batching policy and query frequency from applications. Streaming ingestion can be leveraged for low latency requirements. Data is cached and indexed for faster query performance and optionally exported out to Azure Data Lake in parquet format for batch processing and integration with other Big Data and Machine Learning (ML) services.

Read on for several tips.

Comments closed

Azure Delete Locks

Published 2022-04-05 by Kevin Feasel

Denny Cherry has some advice:

When I’m working in a client’s Azure environment, and they don’t have a delete lock on their production environment I always work on getting them to have one.

This doesn’t always play nicely with everything in Azure, so read on for Denny’s advice when working with Azure Migrate.

Comments closed

An Overview of Azure IoT Central

Published 2022-04-04 by Kevin Feasel

James Serra looks at IoT Central:

This is a short blog to give you a high-level overview on a product called Azure IoT Central. I saw this fairly new Azure product (GA Sept 2018) in use for the first time at a large manufacturing company who was using it at their manufacturing facility (see Grupo Bimbo takes a bite out of production costs with Azure IoT throughout factories). They have thousands of sensors that are collecting data for all the machines used in producing their products. In short, think of it as an “Application Platform as a Service (aPaas)” for quickly building IoT solutions. It’s boxing up IoT hub, Device Provisioning Service (DPS), Stream Analytics, Data Explorer, SQL Database, Time Series Intelligence and Cosmos DB to make it easy to quickly build a solution and get value out of the IoT data. To get an idea of the what this solution would look like, check out the IoT Central sample for calculating Overall Equipment Effectiveness (OEE) of industrial equipment.

I haven’t seen much use of this service, as generally any use case I’ve seen around IoT quickly turns into using IoT Hub and IoT Edge to develop custom code.

Comments closed

An Overview of the Microsoft Defender Ecosystem

Published 2022-04-01 by Kevin Feasel

Alan La Pietra looks at all the Defenders you can get your hands on:

Microsoft Defender Antivirus is available in Windows 10 and Windows 11, and in versions of Windows Server
Microsoft Defender Antivirus is a major component of your next-generation protection in Microsoft “Defender for Endpoint”
Microsoft Defender Antivirus is built into Windows, and it works with Microsoft Defender for Endpoint to provide protection on your device and in the cloud

I see the hand of marketing in this. Which means they’ll probably all have different names nine months from now.

Comments closed

Building S3 Data Pipelines — The Tools

Published 2022-04-01 by Kevin Feasel

Chris Adkin continues a series:

In my last post I outlined a number of architectural options for solutions that could be implemented in light of Microsoft retiring SQL Server 2019 Big Data Clusters, one of which was data pipelines that leverage Python and Boto 3. Before diving into these things in greater detail, lets take a recap on what S3 is.

Click through for a simple data pipeline example.

Comments closed

Conditionally Formatting Multi-Stat Visuals in ADX

Published 2022-03-31 by Kevin Feasel

Hiram Fleitas looks at visual formatting in Azure Data Explorer:

Intro
Start with a free database at aka.ms/adx.free & run this demo query.
let mytable = datatable(key:string, number:int)
[
'one', 1,
'two', 2
];
mytable

Once you have that query, read on to see how you can visualize and format it.

Comments closed

KQL Series

Published 2022-03-31 by Kevin Feasel

Hamish Watson does a document dump:

So what did we do here?
It searched our stored security events in the SecurityEvent table for all Accounts that had a successful login in the last 3 hours and we chose to display only the Account and number of log off events per Account in numerical order with the highest at the top.
So far I’ve introduced some new operators and things – but what is a really quick way to learn KQL?

Start with this post and just keep navigating forward. Hamish has ten posts in total.

Comments closed

Zero-Rename Writes in ElasticMapReduce Hive

Published 2022-03-30 by Kevin Feasel

Suthan Phillips, et al, show off some updates to the way Hive transactions commit in AWS’s ElasticMapReduce:

Our customers use Apache Hive on Amazon EMR for large-scale data analytics and extract, transform, and load (ETL) jobs. Amazon EMR Hive uses Apache Tez as the default job execution engine, which creates Directed Acyclic Graphs (DAGs) to process data. Each DAG can contain multiple vertices from which tasks are created to run the application in parallel. Their final output is written to Amazon Simple Storage Service (Amazon S3).
Hive initially writes data to staging directories and then move it to the final location after a series of rename operations. This design of Hive renames supports task failure recovery, such as rescheduling the failed task with another attempt, running speculative execution, and recovering from a failed job attempt. These move and rename operations don’t have a significant performance impact in HDFS because it’s only a metadata operation when compared to Amazon S3 where the performance can degrade significantly based on the number of files written.
This post discusses the new optimized committer for Hive in Amazon EMR and also highlights its impressive performance by running a TPCx-BB performance benchmark and comparing it with the Hive default commit logic.

Read on for a description of how commit operations work in general and how the updated Hive committer can help with certain types of queries.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Cloud