Press "Enter" to skip to content

Author: Kevin Feasel

Azure SQL Database and Extended Events

Dave Bland shows how to set up and read an extended event file on Azure SQL Database:

This first step when using T-SQL to read Extended Files that are stored in an Azure Storage Account is to create a database credential.  Of course the credential will provide essential security information to connect to the Azure Storage Account.  This first data point you will need is the URL to a blog storage container in you storage account.  If you look below, you can see where you would place your storage account name and the blob storage container name.

Dave gives us the grand tour of the configuration process, including where things differ between on-prem SQL Server and Azure SQL Database (which is quite a bit)

Comments closed

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath:

This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:
– gson-2.2.4.jar: the method is private, and therefore too old for use here
– gson-2.6.1: the method is public, and works fine.
Somewhere between the two, the method’s status changed.

So, because I had some functionality that required the method be public and accessible, it was important I specify the right version in my dependency manager (SBT). “That’s easy,” I thought. “No problem.”

Spoilers: there was a problem.

Comments closed

Kafka and MirrorMaker

Renu Tewari describes what MirrorMaker does for Kafka today and what is coming with version 2:

Apache Kafka has become an essential component of enterprise data pipelines and is used for tracking clickstream event data, collecting logs, gathering metrics, and being the enterprise data bus in a microservices based architectures. Kafka is essentially a highly available and highly scalable distributed log of all the messages flowing in an enterprise data pipeline. Kafka supports internal replication to support data availability within a cluster. However, enterprises require that the data availability and durability guarantees span entire cluster and site failures.

The solution, thus far, in the Apache Kafka community was to use MirrorMaker, an external utility, that helped replicate the data between two Kafka clusters within or across data centers. MirrorMaker is essentially a Kafka high-level consumer and producer pair, efficiently moving data from the source cluster to the destination cluster and not offering much else. The initial use case that MirrorMaker was designed for was to move data from clusters to an aggregate cluster within a data center or to another data center to feed batch or streaming analytics pipelines. Enterprises have a much broader set of  use cases and requirements on replication guarantees.

Read on for the list of benefits and upcoming features.

Comments closed

Collecting Hadoop Metrics from Multiple Clusters

Dmitry Tolpeko shows how you can collate Hadoop metrics from several ElasticMapReduce clusters:

The first step is to dynamically get the list of clusters and their IPs. Hadoop clusters are often reprovisioned, added and terminated, so you cannot use the static list and addresses. In case of Amazon EMR, you can use the following Linux shell command to get the list of active clusters:

aws emr list-clusters --active

From its output you can get the cluster IDs and names. As a cluster ID and IP can change over time, its name is usually permanent (like DEV or Adhoc-Analytics cluster) so it can be useful for various aggregation reports.

Read on to see what you can do with this list of clusters.

Comments closed

Undercover Inspector 1.4

Adrian Buckman takes us through recent changes in Undercover Inspector:

#119 When the backups check module reports backup issues for a database but the issue is with a FULL or DIFF and the LOG is ok, we now show just the primary server in the Preferred replicas column as a FULL and DIFF only applies to the Primary – this reduces the number of warnings raised within the report as it will no longer report for all replica nodes if the AG backup preference is set to Prefer secondary or Secondary Only. See Git issue for more details.

Click through for the full change set.

Comments closed

Distributed Computing Fallacies

Samir Behara takes us through a few fallacies with distributed computing:

The network is reliable
Service calls made over the network might fail. There can be congestion in network or power failure impacting your systems. The request might reach the destination service but it might fail to send the response back to the primary service. The data might get corrupted or lost during transmission over the wire. While architecting distributed cloud applications, you should assume that these type of network failures will happen and design your applications for resiliency.

To handle this scenario, you should implement automatic retries in your code when such a network error occurs. Say one of your services is not able to establish a connection because of a network issue, you can implement retry logic to automatically re-establish the connection.

There are some very good points in here.

Comments closed

Finding Three-Part and Four-Part Names

Pamela Mooney shows how you can find three-part or four-part naming on a SQL Server instance:

The script below searches the metadata for views, sprocs and functions for occurrences of 3 and 4 part names.  Three-part names consist of databasename.schemaname.objectname, and four-part names consist of servername.databasename.schemaname.objectname. Because the code searches metadata, it isn’t always perfect.  If your comments mention a servername followed by a period, for example, it will be caught.  Nevertheless, it’s a great place to begin looking, and a real help in getting rid of problems before they really bite you.

Click through for the script.

Comments closed

Modifying XML in T-SQL

Max Vernon takes us through the .modify function:

Determining the property syntax when modifying XML values in SQL Server can be time consuming if you don’t work with XML regularly. SQL Server includes a very flexible XML subsystem, called XML_DML, or XML Data Manipulation Language. XML_DML can be used to easily and effectively update XML values in an xml-typed column or variable. This question on dba.stackexchange.comasked about using the .modify function to change the value of an element, which in turn prompted this post.

Read on for a number of examples.

Comments closed

Azure SQL Database Serverless

Arun Sirpal takes us through Azure SQL Database Serverless:

This is best used for those single databases that are ever changing with unpredictable patterns. With the concept of being billed per second (based on the vcores used) rather than per hour means that pricing can become more granular especially now with auto-pause becoming possible. The auto-pause delay defines the period of time the database must be inactive before it is automatically paused (only charged for storage). You should only use this if  you can afford some delay in compute warm-up after idle usage periods, otherwise it is best to stick with provisioned compute tiers ( classic tiers). 

I could see this being useful for dev or test databases, or maybe a personal site with heavy external caching.

Comments closed