Press "Enter" to skip to content

Day: September 9, 2020

ACID Transactions with Hive LLAP in ElasticMapReduce

Suthan Phillips and Chao Gao walk us through ACID transactions when using Hive on Amazon’s ElasticMapReduce platform:

ACID (atomicity, consistency, isolation, and durability) properties make sure that the transactions in a database are atomic, consistent, isolated, and reliable.

Amazon EMR 6.1.0 adds support for Hive ACID transactions so it complies with the ACID properties of a database. With this feature, you can run INSERT, UPDATE, DELETE, and MERGE operations in Hive managed tables with data in Amazon Simple Storage Service (Amazon S3). This is a key feature for use cases like streaming ingestion, data restatement, bulk updates using MERGE, and slowly changing dimensions.

This post demonstrates how to enable Hive ACID transactions in Amazon EMR, how to create a Hive transactional table, how it can achieve atomic and isolated operations, and the concepts, best practices, and limitations of using Hive ACID in Amazon EMR.

Click through for a demonstration.

Comments closed

Cloudera’s Not Dead Yet

Alex Woodie shares some good news about Cloudera:

Things are starting to look up for Cloudera, which beat analyst expectations with its second quarter results announced yesterday. The distributed computing platform maker also gave investors something to cheer about with an optimistic financial forecast for the rest of fiscal 2021.

Cloudera, which trades on the New York Stock Exchange under the symbol CLDR, reported a non-GAAP profit of $0.10 per share for the second quarter of fiscal year 2021 ended July 31, exceeding analyst expectations by three cents. A year ago, it reported a non-GAAP loss of $.02 per share a year ago.

The company isn’t in great shape, but this is a good sign.

Comments closed

Automate sp_whoisactive Runs

John McCormack shows how to save the output from sp_whoisactive for later research:

When I took over a fairly troublesome SQL Server, I was always being asked what was running when the SQL Server was slow. That was fine when it was during the day and I could look. But often I would be asked, why was the server slow at 9pm last Friday? Or some other time. It’s not always the easiest thing to do as SQL Server doesn’t natively keep this information for you.

A quick but effective way to automate this is to download and schedule Adam Machanic’s sp_whoisactive stored procedure. I run this every 10 minutes but I have seen it scheduled as often as every 30-60 seconds. It depends on your system and your needs.

This is in place where I work, and I’ve put it into place when consulting for companies without the budget for fancy tools. I really appreciate that Adam Machanic made it a simple option to insert results into a table. Also, John has a step to delete older data, which is critical.

Comments closed

Automating dbachecks Runs

Jess Pomfret automates runs of dbachecks via scheduled task:

To automate the running of our daily checks we’ll first create a PowerShell script and then schedule that using task scheduler.  If you have other enterprise scheduling tools available you could easily use those instead to invoke the PowerShell script.

The script for my example, shown below, is pretty simple. I have a section to define where the data will be stored (the ability to save dbachecks result information straight into a database was introduced with dbachecks 2.0 and so I would highly recommend updating if you’re on an earlier version).

Click through for the full explanation and code.

Comments closed

Conditional Merges in Power Query

Ed Hansberry shows how to perform a conditional join in Power Query:

I am going to use two tables for this example, Fact.Sales and Dim.Customer. I only want my Fact Sales table to have customer info for the Buying Group Tailspin Toys. That info is in the Dim Customer table. I can do this several ways.

1. Do the merge, expand the Buying Group column from the Customer table, then filter to only show those rows.
2. Pre-filter the Customer table for Tailspin Toys in the Buying Group column, then do the merge. It would need to be an Inner Join, otherwise you will get nulls in the Sales table when you expand, and then you have to filter those out. Which you can do, but it is more steps.
3. You can bypass all of that and do it in one step.

I’m not going to bore you with how to do methods 1 and 2. Let’s do method 3, a conditional join!

Read on for the approach, which also can take advantage of query folding when possible.

Comments closed

Automating a Permissions Audit with dbatools

Garry Bargsley walks us through the process of a permissions audit made easier:

My post for T-SQL Tuesday #130 is going to be how I automated my companies SQL Server Permissions Audit process. Even though this only occurs every quarter it is something that I as the DBA never look forward to and the business partners never really enjoy. In an effort to streamline the process from start to finish I built automation using PowerShell, dbatools and ImportExcel. The process now builds the permissions audit file, extracts individual users from Active Directory and e-mails the information to the appropriate team(s) for review.

Read on for details about the process.

Comments closed

General Principles for Automation

Eitan Blumin thinks through a series of general principles around when to automate things:

What I’m gonna do is try to look at the topic of automation and productivity in more general terms, in the hopes of conveying to you my overall methodology while working on a given task.

I will try to formulate a list of “Laws” or “Maxims“, that together will try to represent my personal approach to handling tasks, with the main goal in mind to maximize productivity.

Click through for the list.

Comments closed

Automating Azure Resource Deletion

Drew Skwiers-Koballa has put together a runbook to remove Azure resources tagged in a certain way:

Microsoft puts a lot of effort into making it easy for you to log on to and spin up a VM, SQL Database, Function, or other instance. The Visual Studio Enterprise subscription with $150 credit per month can go quickly if you’re not careful to stop or remove big ticket items as soon as you’re done with them. Even if you have a virtually unlimited Azure account connected to a credit card, expense account, or a trust fund – you probably don’t want to accidentally leave something running longer than it is needed.

With an Azure Automation account and a Powershell workflow runbook I’m able to use resource tags to set resources for autodeletion by date or immediately that evening.

Click through for that runbook.

Comments closed