Press "Enter" to skip to content

Author: Kevin Feasel

Learning from a Hadoop Outage

Sandhya Ramu and Vasanth Rajamani have an after-action report:

For companies and organizations, failure tends to be far more illuminating than success and the lingering effects of a failure can be harmful if the team moves too quickly and does not resolve the issue in a thorough and transparent manner. We recently ran into a large incident that involved data loss in our big data ecosystem and by reflecting on our diagnosis and response, we hope that our learnings from an impactful incident in our big data ecosystem will be insightful.

Here’s what happened: roughly 2% of the machines across a handful of racks were inadvertently reimaged. This was caused by procedural gaps in our Hadoop infrastructure’s host life cycle management. Compounding our woes, the incident happened on our business-critical production cluster.

Read on to understand what happened and why. It’s a lesson in the importance of having a disaster recovery plan and testing it

Comments closed

DAX Patterns: Second Edition

Marco Russo announces a second edition of DAX Patterns:

Great news! Just one year after releasing the second edition of The Definitive Guide to DAX, we just published a new website, a new book, and a new collection of videos: the second edition of DAX Patterns!

DAX Patterns is a collection of patterns in DAX for Power BI, Analysis Services Tabular, and Power Pivot for Excel. The first edition of DAX Patterns dates back to the end of 2014, and it was based on Power Pivot for Excel. Since then, DAX has evolved with many useful features. Most importantly, Power BI hit the market, and the number of users adopting DAX grew at an exponential rate. When we published the first edition of this book, Power BI had not even been announced yet. Today, most DAX users create a Power BI solution. The new edition of DAX Patterns is thus based on the tool you love: Power BI.

The book looks to be quite useful, and you can get an idea if this content is right for you from the DAX Patterns website. What’s crazy is that they’re offering everything in the book on the website for free, but I’d suggest that if you pick up enough good info from the site, give back by buying a copy of the book or videos.

Comments closed

Semantic Search with FileTable

Haroon Ashraf continues a series on semantic search with Windows and SQL Server:

The focus of the article is on comparing documents that can be stored on Windows File System in one respect and in the other respect their comparative analysis that can be performed with Semantic Search in SQL Server.

Additionally, the readers will learn how to store unstructured data by exploring File Table and creating MS Word documents on the fly (instantly) to be consumed by Semantic Search.

This part of the article is related to the use of Semantic Search on unstructured data for the extraction of basic level business-crucial information provided standard naming is in place.

Click through for the article.

Comments closed

SQL Serverless in Azure Synapse Analytics

James Serra talks to us about SQL serverless (presently known as SQL on-demand but I’m getting ahead of the marketing curve this time):

Querying data in ADLS Gen2 storage using T-SQL is made easy because of the OPENROWSET function with additional capabilities (check out the T-SQL that is supported). The currently supported file types in ADLS Gen2 that SQL-on-demand can use are Parquet, CSV, and JSON. ParquetDirect and CSV 2.0 add performance improvements (see Benchmarking Azure Synapse Analytics – SQL Serverless, using .NET Interactive). You can also query folders and multiple files and use file metadata in queries.

Read on to learn a lot more about its use cases.

Comments closed

Stopping and Starting Virtual Machines in a Resource Group

Dennes Torres walks us through a script to stop or start all virtual machines in an Azure resource group:

Some tasks on azure are easier if we automate them. The Azure Portal provides us the cloud shell, which we can use for this kind of automation.

I was making some experiences with SQL Server Always On, so I created three VMs inside a resouce group. Every time I want to start some experiment I need to start all three VMs and, in the end, stop all three again.

Read on to see how Dennes is able to accomplish this.

Comments closed

Release Rollback with Helm

Andrew Pruski shows the secret of how Helm lets you roll back releases even when deployments are deleted:

If we rollback with kubectl rollout undo the pods in the newest replicaset are deleted, and pods in an older replicaset are spun back up, rolling back the upgrade.

But there’s a potential problem here. What happens if that old replicaset is deleted?

If that happens, we wouldn’t be able to rollback the upgrade. Well we wouldn’t be able to roll it back with kubectl rollout undo, but what happens if we’re using Helm?

Read on to learn how the whole thing works.

Comments closed

Fixing Bad Data in Power BI

Matt Allington does the thing you shouldn’t (often) do:

Let me make this statement upfront and be clear. The best way to solve problems with source data are to go back to the source and correct the problems there. This is my recommendation on how you should solve such issues. However, sometimes that is not possible for whatever reason. This article will explain how you can use Power Query to override incorrect data during load when you can’t change it at the source, for whatever reason.

This is the classic BI tool quandry: the best solution is, as Matt mentions, to fix the source system. But when that’s not on the table—such as when you’re getting data from a third party—Matt has methods to work through data issues.

Comments closed

EXTPTR_PTR Error with Rcpp

Rick Pack walks us through an error in R:

I experienced a need to update Rcpp when I attempted to install the readxlsb R package, which promised to enable me to read .xlsb files in R.

What happened next has been forgotten: Did the attempted update of Rcpp appear to succeed or fail? I did record that my attempted installation of readxlsb still failed and I now experienced an unfamiliar error when I opened and closed R Studio:

“The procedure entry point EXTPTR_PTR could not be located in the dynamic link library”

Read on to see how Rick solved this problem.

Comments closed

What’s New in Kafka 2.6?

Randall Hauch takes us through the changes in Apache Kafka 2.6:

We’ve made quite a few significant performance improvements in this release, particularly when the broker has larger partition counts. Broker shutdown performance is significantly improved, and performance is dramatically improved when producers use compression. Various aspects of ACL usage are faster and require less memory. And we’ve reduced memory allocations in several other places within the broker.

This release also adds support for Java 14. And over the past few releases, the community has switched to using Scala 2.13 by default and now recommends using Scala 2.13 for production.

Finally, these accomplishments are only one part of a larger active roadmap in the run up to Apache Kafka 3.0, which may be one of the most significant releases in the project’s history. 

Read on to learn more.

Comments closed

The Dunder Mifflin Data Set

Tim Mitchell has a new data set for us:

I’ve been a fan of Dunder Mifflin ever since I first learned about this small midwestern paper company. Over the years I’ve gotten to know their people and processes, following from a distance their successesfailures, and various adventures. Who would have known the paper business would be so interesting?

Based on what I learned about this company, I built this Dunder Mifflin data set based on the old Northwind structure, adapting it to meet the needs of this small paper company. It includes most of the employees, regional locations (both current and now-closed), and has a modestly-sized set of sales data for demos and testing.

Check out Tim’s GitHub repo and give it a try.

Comments closed