Press "Enter" to skip to content

Author: Kevin Feasel

Monitoring Storage Metrics

Robert Sheldon continues a series on storage concepts for the DBA:

When monitoring storage systems, engineers should track a variety of metrics to ensure the systems continue to meet application requirements. Three of the most important and commonly cited metrics are latency, I/O operations per second (IOPS), and throughput. In addition to these three, queue length and I/O splitting can also provide valuable insights into storage performance.

In this article, I discuss all five of these metrics and demonstrate them in action. Despite my focus on these five, they’re not the only important metrics to monitor. For example, engineers should also track storage capacity, device cache usage, controller operations, and storage networks. Even seemingly unrelated components can be a factor, such as low CPU utilization, which can indicate that the processor is waiting on storage to complete requests from the application.

Robert also shows off the new perfmon, which may or may not be better than Perfmon Classic.

Comments closed

Stopping and Starting an Azure Kubernetes Service Cluster

Mohammad Darab wants to save some cash (or at least Azure credits):

I remember when I first started deploying Big Data Clusters, they were on Azure Kubernetes Service utilizing the $200 credit for first time sign ups. By the time I got around to figuring out how to deploy the BDC, not only was my $200 credit gone, but I started to incur cost out of pocket.

If only there was a feature that would allow me to stop the VMs in AKS whenever I wasn’t using them. Well, I’m excited to share that Microsoft AKS (Azure Kubernetes Service) came out with a neat feature (currently in preview at the time of the publishing of this post) that allows you to stop and start your AKS cluster by running a simple command. Of course I had to try it out on BDCs and to my surprise it worked. Well, sort of. Let me explain…

Read on for more information, as well as current limitations.

Comments closed

Mixed MultiSubnetFailover Support on AGs

Andy Mallon continues a line of thought:

In yesterday’s post, I showed how to configure an availability group (AG) to use the RegisterAllProvidersIP=0 when you can’t get clients to connect using the MultiSubnetFailover=true connection string attribute.

I mentioned that you have to make some trade-offs when you set RegisterAllProvidersIP=0, and included this comparison:

But….when if you can eat your cake and have it, too?

In some cases, you’ll have some applications & clients that are not able to use MultiSubnetFailover=true, and other clients that can. Perhaps you’re working on updating a bunch of legacy Java apps to move from old jTDS drivers to the current Microsoft JDBC drivers that properly support MultiSubnetFailover=true. Parts of your codebase have been updated, and you want them to make use of the connection string attribute for fast cross-subnet failover. But other parts of your codebase are still being updated and rely on the RegisterAllProvidersIP cluster parameter to be false. Wouldn’t it be nice to have both?

Read on to learn how.

Comments closed

Database Mail on Azure SQL Managed Instances

John McCormack shows how you can set up database mail from an Azure SQL Managed Instance:

It’s not too difficult to set up database mail for Azure SQL DB Managed Instance in comparison to SQL Server (on-prem or IaaS) however there are a few extra things to consider. This post will describe how to set up database mail for Azure SQL DB Managed Instance. I will use Sendgrid as the mail provider but you can follow the same steps for any other mail provider or your company’s smtp server.

Before I go on, my personal opinion is that including database mail is a massive feature for Managed Instances. The lack of DB Mail on Azure SQL DB Single Database or Amazon RDS is a major blocker to PaaS adoption. Now with Managed Instance, we can have PaaS and database mail.

Read on for the instructions. There’s a little bit more than what you typically would need to do on-premises, but just a little bit.

Comments closed

Xenographs

Alex Velez talks about xenographs:

I recall the first time I came across a horizon chart. Two thoughts came to mind: 1) this looks cool; and 2) I don’t have the energy to figure this out. Fast forward to now. I’ve learned how to read horizon charts, and I’ve even identified a few good use cases for them. This illustrates both the problem and the potential of xenographs. Let’s explore the potentially problematic side first.

Novel approaches to visualizing data can intimidate audiences. They introduce a learning curve because a never-before-seen graph typically requires time and energy to decipher. This obstacle could be enough to dissuade audiences from consuming the data altogether. Even if your audience does invest their time, the resulting conversation is often about reading the visual instead of the primary takeaway. This seems counterintuitive, especially in the explanatory analytics space, but it doesn’t mean we should denounce everything novel.

My response to this depends heavily on the medium. If you’re giving a presentation, a novel or underused chart can be good if it helps tell the story. You have the advantage of being there to explain the dynamics of the diagram for people who have never seen it before. For an informative article, you have some ability to elaborate, as in this bracket win probabilities diagram, which is exactly the type of thing you’d see in certain newspapers and magazines. But unless your visual is immediately intuitive (and I’d consider things like a Manhattan plot or maybe a Dot-boxplot to be intuitive enough for most audiences), I don’t think I would include many of those on public-facing or corporate dashboards, as they’re liable to confuse people and you might not have the space available to explain how this works.

Comments closed

Persisting an RDD in Spark

Sarfaraz Hussain takes us through caching / persisting RDDs in Apache Spark:

Spark RDD persistence is an optimization technique which saves the result of RDD evaluation in cache memory. Using this we save the intermediate result so that we can use it further if required. It reduces the computation overhead.

When we persist an RDD, each node stores the partitions of it that it computes in memory and reuses them in other actions on that RDD (or RDD derived from it). This allows future actions to be much faster (often by more than 10x). Caching is a key tool for iterative algorithms and fast interactive use.

Read on to see how you can do this and some of the options available to you when caching. This is extremely useful when working with external data sources, as then you don’t risk hitting the external source multiple times.

Comments closed

Multi-Subnet Availability Groups and MultiSubnetFailover

Andy Mallon takes us through an all-too-common scenario:

With a default configuration, multi-subnet AGs require that the clients connecting to them include MultiSubnetFailover=true as a connection string attribute. This attribute tells the driver to expect DNS to provide multiple IP addresses for the Listener name, and to try all of them to find the correct IP to connect to for that network name. Clients that do not specify this attribute will get multiple IPs and not know how to handle them properly–most drivers will pick up one of the returned IPs at random (or maybe just seemingly random), and try to connect to that. This can result in random (or seemingly random) connection failures when it picks the wrong IP.

However, not every client or application will support this connection string attributes. In my experience there are two extremely common reasons that you can’t use MultiSubnetFailover=true:

Read on to see what you can do in that case.

Comments closed

ODBC Scalar Functions

Shane O’Neill discovers ODBC scalar functions:

Can you imagine my shock when I came across a piece of code that not only was not for finding and replacing but even though I did not think it would compile, it did!

If you can imagine my shock, then you’re going to need to increase it more when I tell you that there are a whole family of the same functions!
Here is the code that threw me for a loop the first time I saw it.

SELECT {d '1970-01-01'};

I wasn’t familiar with this syntax either, but if you work heavily with multiple data sources, it can be quite useful—for example, Teradata and DB/2 support them, as well as Shane’s examples of SQL Server and Oracle.

Comments closed

Integrating Power BI into Azure Synapse Analytics

Ginger Grant walks us through two methods of integrating Power BI and Azure Synapse Analytics:

From within Synapse you have the ability to access a Power BI workspace so that you can use Power BI from within Synapse.  Your Power BI tenant can be in a different data center than the Azure Synapse Workspace, but they both must be in the same Power BI Tenant.  You can use Power BI to look at any data you wish, as the data you use can be from any location. When this blog was written, it was only possible to connect to one Power BI workspace from within Azure Synapse. In order to run Power BI as shown here, first I needed to create a Linked Service from within Synapse.

Read on for more.

Comments closed

ANSI String Comparison

Greg Dodd takes us through one of the oddities of ANSI string comparison:

So let’s play a game, what will the output be if I pass in the following? I’ve included my guesses

EXEC AreStringsTheSame N'Greg', N'Greg' --Yes
EXEC AreStringsTheSame N'Greg', N'Dodd' --No
EXEC AreStringsTheSame N'', N'' --Yes
EXEC AreStringsTheSame N' ', N' ' --Yes
EXEC AreStringsTheSame N' ', N'' --No
EXEC AreStringsTheSame N' Greg', 'Greg' --No, leading space
EXEC AreStringsTheSame N'Greg ', 'Greg' --No, trailing space

Read on to see what Greg ended up guessing wrong and why.

Comments closed