Press "Enter" to skip to content

Day: September 18, 2019

Multi-Region Replication with Confluent Platform

David Arthur walks us through multi-region replication of Kafka clusters in the Confluent Platform 5.4 preview:

Running a single Apache Kafka® cluster across multiple datacenters (DCs) is a common, yet somewhat taboo architecture. This architecture, referred to as a stretch cluster, provides several operational benefits and unlocks the door to many uses cases. Stretch clusters provide better durability guarantees and make disaster recovery much easier by avoiding the problem of offset translation and restarting clients. However, in order to operate a reliable stretch cluster, datacenters must be relatively close to each other and have a very stable, low latency, and high-bandwidth connection among the DCs.

This changes with the preview release of Confluent Platform 5.4, which includes multi-region replication built directly into Confluent Server. Now operators can choose to replicate data on a per-region basis, synchronously or asynchronously, per topic. This functionality allows operators to increase data durability and automate client failover in the event of a disaster.

And of course all of those rules about RPO, RTO, etc. apply to this.

Comments closed

Diagnosing TCP SACKs-Related Slowdown in Databricks

Chris Stevens, et al, walk us through troubleshooting a slowdown after using Linux images which have been patched for the TCP SACKs vulnerabilities:

In order to figure out why the straggler task took 15 minutes, we needed to catch it in the act. We reran the benchmark while monitoring the Spark UI, knowing that all but one of the tasks for the save operation would complete within a few minutes. Sorting the tasks in that stage by the Status column, it did not take long for there to be only one task in the RUNNING state. We had found our skewed task and the IP address in the Host column pointed us at the executor experiencing the regression.

This is a nice case study of network troubleshooting, so of course there are Wireshark screenshots in it.

Comments closed

Accessing Data in Azure Data Lake Storage Gen 2

James Serra gives us several methods to access data in Azure Data Lake Storage Gen 2:

With data lakes becoming popular, and Azure Data Lake Store (ADLS) Gen2 being used for many of them, a common question I am asked about is “How can I access data in ADLS Gen2 instead of a copy of the data in another product (i.e. Azure SQL Data Warehouse)?”. The benefits of accessing ADLS Gen2 directly is less ETL, less cost, to see if the data in the data lake has value before making it part of ETL, for a one-time report, for a data scientist who wants to use the data to train a model, or for using a compute solution that points to ADLS Gen2 to clean your data. While these are all valid reasons, you still want to have a relational database (see Is the traditional data warehouse dead?). The trade-off in accessing data directly in ADLS Gen2 is slower performance, limited concurrency, limited data security (no row-level, column-level, dynamic data masking, etc) and the difficulty in accessing it compared to accessing a relational database.

Since ADLS Gen2 is just storage, you need other technologies to copy data to it or to read data in it.

Read on for the solution.

Comments closed

Disambiguating Azure SQL Database Classes

Arun Sirpal explains the different types of Azure SQL Database available to us:

I want to do a quick summary post of the many different types of Azure SQL Database available and I am not talking about elastic pools, VMs etc, more so the singleton type.

Azure SQL Database (I call normal mode) – A choice between the DTU model (Basic, Standard and Premium) and vCore (General Purpose and Business Critical). Within this space there are two different architecture types used by Microsoft under the covers.

As the product expands, we get more and more options, and Arun clarifies where each fits.

Comments closed

Alerting on Refresh Failure in Power BI

Matt Allington shows how you can set up an alert to contact you when a scheduled Power BI data refresh fails:

In this article I am going to share with you a concept to manage refresh failures in production PowerBI.com reports. In a perfect world, you should configure your queries so that they “prevent” possible refresh failure issues from occurring, but also to notify you when something goes wrong without the report refresh failing in the first place. There are many things that can go wrong with report refreshes and you probably can’t prevent all of them occurring. In the example I use in this article I will show you how to prevent a refresh failure caused by duplicates appearing in a lookup table after the report has been built, the model has been loaded to PowerBI.com and the scheduled refresh has been set up using a gateway. If during refresh a duplicate key occurs in any of the Lookup tables in the data model, the refresh fails, and the updated data does not go live.

Matt’s specific scenario is around duplicate data, but it can extend to other issues as well.

Comments closed

Troubleshooting Power BI Refresh Failures

Annie Xu gives us a few reasons why Power BI refreshes might fail:

License level: Power BI Premium license has different level and Power BI Premium capacity is shared within a Tenant and can be shared by multiple workspaces.The maximum number of models (datasets) that can be refreshed in parallel is based on the capacity node.  For example, P1 = 6 models, P2 = 12 models and P3 = 24 models.

Click through for the set of possibilities.

Comments closed

IIS Log Analysis in Power BI

Joy George Kunjikkur shows how you can build a Power BI dashboard to analyze IIS log files:

As developers, we all might have encountered situation of analyzing IIS web server logs. During the development time, the file is small and easy to analyze in Notepad or Excel. But when it grows to GBs in production servers we use other tools. One such popular tool to query IIS logs is LogParser. It is a free command-line tool from Microsoft. There are graphical applications around it to generate even charts. One such free tool is the Log Parser Studio. It is also from Microsoft.


Once we move up in career and had to deal with managers, product stakeholders or ever to CXO to show what the IIS logs say, we need more visuals or a dashboard reflecting the IIS logs. Though we can create visual using Log Parser Studio, it is tedious creating reports and charts one by one. 

Click through for a solution.

Comments closed

Upgrading Azure Kubernetes Service

Chris Taylor has a point updates to jump in Azure Kubernetes Service:

As it is late at night my brain wasn’t working as it should be but thought I’d put a quick blog out there to say that if you are on v1.11.5 and want to upgrade to >= v1.13.10 then you have to do this in a 2 stage process by upgrading to v1.12.8 first:

Fortunately, upgrading is pretty easy using the Azure command line or even the Azure portal.

Comments closed