Press "Enter" to skip to content

Day: September 20, 2019

Calculating YARN Utilization Metrics

Dmitry Tolpeko shows how you can calculate per-second cluster utilization measures from YARN’s resource manager logs:

But even if you query YARN REST API every second it still can only provide a snapshot of the used YARN resources. It does not show which application allocates or releases containers, their memory and CPU capacity, in which order these events occur, what is their exact timestamp and so on.

For this reason I prefer a different approach that is based on using the YARN Resource Manager logs to calculate the exact per second utilization metrics of a Hadoop cluster.

This is a bit more complicated than hitting the REST API, but Dmitry shows some of the benefits of doing so.

Comments closed

Registering SignalR to the Cosmos DB Change Feed

Hasan Savran shows us how we can hook up SignalR to view the Cosmos DB Change Feed:

SignalR allows server code to send asynchronous notifications to client-side web applications. By using it, Azure Functions can send real-time messages to your web applications. Prices can get change whenever data changes in database. Notices can be sent if user needs to be notified. Numbers in dashboard can change dynamically when data changes in Cosmos DB. You can do all those with Azure Cosmos DB + Azure Functions and SignalR. This combination works like David Copperfield magic.

There’s a bit of work involved but Hasan shows us how to get it done.

Comments closed

Custom Power BI Date and Time Formats

Chris Webb continues a series on Power BI custom formats:

In my last post I showed lots of examples of how Power BI’s new custom format string feature can be used to format numbers. This post, looking at dates and times, will be a bit different for two reasons: there are a lot more useful examples of custom date and time formats built into Power BI Desktop, and some of the format placeholders listed in the VBA documentation aren’t supported in Power BI. As a result I’m going to concentrate on some useful formats that aren’t covered well by the examples and highlight a few things that aren’t possible right now.

Read on for a slew of demos.

Comments closed

COUNT(*) Versus COUNT(1)

Lukas Eder takes on the myth that COUNT(*) differs from COUNT(1):

Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), counting ALL the rows in the group again.

There should be no difference, and parsers / optimisers should be able to recognise this and not do the extra work of checking every expression evaluation for NULL-ness.

Interestingly, in one of the four major RDBMS platforms (not including DB2), there is a performance difference of about 10%.

Comments closed

Version Store and ONLINE Operations

Josh Darnell takes us through how SQL Server manages ONLINE = ON operations (such as index building and rebuilding) using the version store:

The votes table has about 10 million rows in it, so this takes a bit of time (10-15 seconds if nothing else is happening). If I check sys.dm_tran_version_store_space_usage and sp_WhoIsActive, I can see that:

– the version store is not growing, and
– the ALTER statement is chugging along making progress

There are costs to setting ONLINE = ON. I think they’re almost always worth it, but it’s important to remember that they are there.

Comments closed

The Costs of Bad Statistics

Monica Rathbun explains what happens when statistics go wrong:

Over Estimations of Rows (Actual > Estimated) leads to:

– Selection of parallel plan when a serial plan might be more optimal
– Inappropriate join strategy selections
– Inefficient Index Navigation (scan verses seek)
– Inflated Memory Grants

Read the whole thing. The optimizer doesn’t get to look at actual data when determining plans (save for something like adaptive query join processing, but that’s pretty rare), so statistics are its link to reality.

Comments closed

Strong and Weak Power BI Relationships

Alberto Ferrari takes us through the two different kinds of relationships in Power BI:

A relationship in a Tabular model can be strong or weak. In a strong relationship the engine knows that the one-side of the relationship contains unique values. If the engine cannot ensure that the one-side of the relationship contains unique values for the key, then the relationship is weak. A relationship can be weak because either the engine cannot ensure the uniqueness of the constraint – due to technical reasons we outline later – or the developer defined it as such. A weak relationship is not used as part of table expansion. Let us elaborate on this

Something I’d like to see improved in Power BI is to differentiate strong versus weak relationships in the UI. Having no way to differentiate is okay if you only have a few tables or if you designed everything, but coming in late and reviewing a big model, it’s annoying to double-click each link to see if it’s strong or weak.

Comments closed

Starting SQL Server in Single-User Mode

Ranga Babu has a few methods for starting SQL Server in single-user mode:

It is advisable to use SQLCMD when you want to query SQL Server that is started in single user mode as connecting directly and query using SQL Server Management Studio that uses more than one connection. To query SQL Server single user mode using SQL Server Management Studio, open SQL Server Management Studio, and do not connect to SQL Server directly. Close the connection window and click on New Query as shown in the below image which opens a query editor in SQL Server Management Studio:

I recommend practicing this a few times, as the only time you’d actually start SQL Server in single-user mode is during an emergency and that means people breathing down your neck (figuratively if not literally).

Comments closed