New Ambari Version

Kevin Feasel



Paul Codding announces Ambari 2.2.2:

Grafana is deployed, managed and pre-configured to work with the Ambari Metrics service. We are including a curated set dashboards for core HDP components, giving operators at-a-glance views of the same metrics Hortonworks Support & Engineering review when helping customers troubleshoot complex issues.

Metrics displayed on each dashboard can be filtered by time, component, and contextual information (YARN queues for example) to provide greater flexibility, granularity and context.

Ambari is really shaping up to be a nice framework for managing a Hadoop cluster.  I’m excited to see improved monitoring capabilities.

Quick Counts In Powershell

Chrissy LeMaire has a quick pair of one-liners for counting occurrences in Powershell:

I always forget how to do this, and Aleksandar Nikolić posted a really beautiful answer on

For a file:

-split (Get-Content .\test.txt | Out-String) | Where-Object { $_ -eq "test" } | Measure-Object | Select-Object -exp count

That was easy.  Check out the article to see how to do this with a string.

Query Store Storage Options

Erin Stellato looks at Query Store storage options, specifically MAX_STORAGE_SIZE:

Now, there are catalog views that allow you to view the Query Store data.  You can copy that data into another database using SELECT INTO, and then do comparisons, but wouldn’t it be nice to have some kind of export option?  There’s a Connect item for that:

Export Query Store tables separately from the database tables:

If you think this is something that would be useful for Query Store, please up vote it!  Again, Query Store is available in ALL editions of SQL Server 2016, this is definitely a feature you CAN use and will want to use!  This potential option won’t make it into RTM, but with the change in how SQL Server is releasing CUs, the more important the SQL Server team sees this option (as a result of votes), the faster it might make it in a release.

Query Store is one of the most exciting features for database administrators to hit in quite a while.  There will be some V1 pains, but this feature is well worth the upgrade to 2016.

Setting Up A Linked Server To Oracle

Jon Morisi steps in to show how to set up a linked server connection to an Oracle database:

In this dialog box, the “TNS Service Name” drop down box should display your entries from the tnsnames.ora file.  Next, enter your Oracle User ID and click “Test Connection”, at which point you’ll be prompted for your password.  Everything should test successfully at this point.

Now would be a good time to restart.  Unfortunately, yes you need to restart…

You can do an additional test via sqlplus.  Open a windows command prompt and enter the following:

sqlplus user/pass@[addressname]

(Where addressname is one of your connections from tnsnames.ora)

I readily admit that I’m glad I don’t need to work with Oracle.  Nonetheless, if you do need to integrate the two, this step-by-step guide will show you how.

Spark Accumulators

Prithviraj Bose explains accumulators in Spark:

However, the logs can be corrupted. For example, the second line is a blank line, the fourth line reports some network issues and finally the last line shows a sales value of zero (which cannot happen!).

We can use accumulators to analyse the transaction log to find out the number of blank logs (blank lines), number of times the network failed, any product that does not have a category or even number of times zero sales were recorded. The full sample log can be found here.
Accumulators are applicable to any operation which are,
1. Commutative -> f(x, y) = f(y, x), and
2. Associative -> f(f(x, y), z) = f(f(x, z), y) = f(f(y, z), x)
For example, sum and max functions satisfy the above conditions whereas average does not.

Accumulators are an important way of measuring just how messy your semi-structured data is.

Database Detachments And File Permissions

Daniel Hutmacher looks at what happens when you detach a database:

On most database servers, the SQL Server service account is granted full control of the directories that host the database files. It goes without saying that the service account that SQL Server runs on should be able to create, read, write and delete database files. Looking at a sample database on my local server, the .mdf and .ldf files don’t actually inherit permissions from their folder, although the permissions are very similar to that of the folder.

This all makes sense once you read the explanation, but it’s not intuitive behavior.  Read Daniel’s gotcha near the end of the post.

Graphing With Microsoft R Open

David Smith points out a free e-book on creating effective graphs with Microsoft R Open:

The examples were done using Microsoft R Open, but since it’s 100% compatible with R the code works with any relatively recent R version.

Naomi and Joyce presented several examples from their e-book in a recent webinar (presented by Microsoft), and fielded lots of interesting questions from the audience. If you’d like to see the recorded webinar and also receive a copy of the slides and the e-book, follow the link below to register to receive the materials via email.

The book is free, the code is available on GitHub.  What more could you ask for?

Power BI Desktop Or Power Pivot

Bill Anton discusses when to use Power BI Desktop and when to use Power Pivot:

In the whitepaper, Strategic Prototyping is defined as the process of leveraging Power BI to explicitly seek out feedback from users during a requirements discovery session. The general idea is to use a prototyping tool to quickly slap together a model and mock up some reports while working closely with 1 or more business users. This helps ensure all reporting capabilities are flushed out and accounted for. After several iterations, the Power BI model becomes the blueprint upon which an enterprise solution can be based.

Prior to the emergence of Power BI, the tool of choice for strategic prototyping (at least in Microsoft shops) was Power Pivot. And even though the reporting side of Power Pivot is nowhere near as sexy as Power BI, there is one really awesome feature that does not (yet?) exist with Power BI… and that’s the “Import from PowerPivot” option in visual studio…

Bill does a good job of explaining the alternatives and, importantly, explaining that whichever you pick, there will be follow-up work.

Installing Apache Falcon

Awanish at Edureka shows how to install Apache Falcon on your Hadoop cluster:

Apache Falcon is a framework for managing data life cycle in Hadoop clusters. It establishes relationship between various data and processing elements on a Hadoop environment, and also provides feed management services such as feed retention, replications across clusters, archival etc.

Let us first discuss how to setup Apache Falcon. Run the below given command to download git repository of Falcon:

Command: git clone falcon

Falcon comes as part of the Hortonworks Data Platform; Cloudera has its own alternative.

Restoring Azure SQL Databases With Powershell

Mike Fal shows us how to restore an Azure SQL Database database using Powershell:

The most fundamental form of disaster recovery is database backups and restores. Typically setting up backups is a lot of work. DBAs need to make sure there’s enough storage available for backups, create schedules that accommodate business operations and support RTOs and RPOs, and implement jobs that execute backups according to those schedules. On top of that, there is all the work that has to be done when backups fail and making sure disk capacity is always large enough. There is a huge investment that must be made, but it is a necessary one, as losing a database can spell death for a company.

This is one of the HUGE strengths of Azure SQL Database. Since it a service offering, Microsoft has already built out the backup infrastructure for you. All that stuff we talked about in the previous paragraph? If you use Azure SQL Database, you do not have to do any of it. At all.

What DBAs still need to manage is being able to restore databases if something happens. This is where Powershell comes into play. While we can definitely perform these actions using the portal, it involves a lot of clicking and navigation. I would much rather run a single command to run my restore.

The Powershell cmdlets are easy to use, so spin up an instance and give it a try.


May 2016
« Apr Jun »