Press "Enter" to skip to content

Category: Administration

Capturing a TCP Dump in an Azure Databricks Notebook

Stithi Panigrahi does some troubleshooting:

Due to the potential impact on performance and storage costs, Azure Databricks clusters don’t capture networking logs by default. Follow the below instructions if you need to capture tcpdump to investigate multiple networking issues related to the cluster. These steps will capture a TCP dump on each cluster node–both driver and workers during the entire lifetime of the cluster.

Click through for an initiation script, which generates the actual script, which itself generates the TCP dumps.

Comments closed

Setting up Ola’s Index Maintenance with Azure Runbooks and Terraform

Josephine Bush builds on prior work:

Yes, you still need to do some work to maintain indexes in Azure SQL Database. This post will walk you through setting up statistic updates and index maintenance using Terraform.

Thanks to Tracy Boggiano for her directions for setting up the runbooks. If you want to do this manually instead of with Terraform, Tracy’s post walks you through it step by step. I only modified the role assignment so it had read to the entire subscription level to loop through every DB in the subscription.

Thanks to Kendra for blogging about index maintenance in Azure SQL. Her post helped me decide on index maintenance thresholds.

Click through for a link to Josephine’s GitHub repo and a walkthrough of how it all works.

Comments closed

Taking over a Personal Power BI Workspace

Olivier Van Steenlandt performs a hostile takeover:

The end user can save a Power BI report created in the Power BI Service in their own Personal Workspace but then they have no way to share it with their team. It remains a personal report/analysis.

When executing the analysis in Excel, they can distribute the Excel-file as they see fit.

The pitfall of personal workspaces in Power BI, from my point of view, is that if someone leaves the company, there isn’t a straight forward way for other end users to  take-over previously created analysis.

This is where the Power BI Administrator can be of importance.

Read on to see how a Power BI Administrator can gain access to a personal workspace and migrate reports out of it.

Comments closed

Finding the ACTIVE_TRANSACTION Culprits

Thamires Lemes digs into high transaction log utilization:

The transaction log in SQL Server records all changes made to a database, allowing for data recovery and consistency. When a transaction is initiated, it acquires space in the transaction log to record its activities. Long running transactions have the potential to hold the transaction log, and, depending on database write activity, cause errors and disruptions in the SQL Server environment.

It is important to point out that the transaction that is holding the transaction log might not be performing any write activities to consume additional log space, but subsequent transactions that writes to the transaction log will cause its utilization to increase, even if they are fast. The log space won’t be released until the oldest transaction concludes its execution.

Click through for a few queries on the topic. I’d also highly recommend sp_whoisactive for this kind of work.

Comments closed

Killing a Running Apache Spark Application

The Big Data in Real World team pulls the plug on an application:

Apache Spark is a powerful open-source distributed computing system used for big data processing. However, sometimes you may need to kill a running Spark application for various reasons, such as if the application is stuck, consuming too many resources, or taking too long to complete. In this post, we will discuss how to kill a running Spark application.

Click through to see how you can do this.

Comments closed

Connection Pooling in Postgres

Semab Tariq shows off a tool for Postgres:

PgBouncer is a lightweight yet powerful connection pooling tool for PostgreSQL. It efficiently manages and reuses database connections, reducing the load on the server and improving performance. It acts as an intermediary between applications and the PostgreSQL database, optimizing connection usage and enhancing scalability.

This is a bit different from SQL Server, where connection pooling is built in. Read on to see how it works.

Comments closed

Oracle Errors: Snapshot Too Old and LOB Columns

David Fitzjarrell tackles a pair of errors:

One of the few errors taht strikes fear in the heart of a DBA is the dreaded:

ORA-01555 snapshot too old
and
ORA-22924 snapshot too old

Of course there are plenty of blogs instructing the DBA to simply “increase the undo_retention”, and there are cases where this works as expected. However, LOBs can be different as two different mechanisms exist for undo management. A LOB column can be configured to use retention to manage before mages of the data, but that can be confusing as each LOB column MAY have its own retention setting. The DBA_LOBS view reports whether LOG column uses retention or pctversion to manage undo, and the associated setting being used. Let’s -dig into this a bit deeper.

Read on to learn more about how LOB retention works, the types of issues you can run into with it, and how to correct those issues.

Comments closed

Common SQL Server Mistakes: Default Auto-Growth

Hemantgiri Goswami takes a look at auto-growth:

Auto Growth is a feature that allows database files (primary, secondary, and log) to expand when the database file becomes full – without manual intervention.

Auto Growth feature is handy when we do not want to increase the size of database files manually. There are two ways you can set auto growth – using SQL Server Management Studio (SSMS hereafter) and T-SQL. Auto Growth can be configured – In Percent and Megabytes.

Auto-growth isn’t a problem on its own, though growth sizes, especially in older versions of SQL Server, were far too low for medium- and large-sized databases.

I don’t particularly like the 2.5 MB example Hemantgiri shows. I have a quick rule of thumb which is 64MB for small databases, 256-512 for medium-sized databases, and 1GB for large databases (assuming my underlying disk is fast). This limits the number of auto-growth events and, for log files in particular, keeps virtual log file counts more reasonable.

Comments closed

Vacuuming in PostgreSQL

Muhammad Ali keeps things tidy:

If you’re a PostgreSQL user, you’ve undoubtedly come across the term “vacuum“. This operation plays a pivotal role in maintaining the optimal performance of your database while preventing unnecessary data bloat. In this blog, we’ll understand how vacuum works on high level, its significance, types, server parameters that influence autovacuum operations, and general FAQ’s on vacuum.

Read on to learn more about what vacuuming does and why it is important. It also turns out that there are multiple types of vacuuming.

Comments closed

Debugging an Unresponsive Elasticsearch Cluster

Derric Gilling troubleshoots an Elasticsearch cluster:

Because of this sharding, a read or write request to an Elasticsearch cluster requires coordinating between multiple nodes as there is no “global view” of your data on a single server. While this makes Elasticsearch highly scalable, it also makes it much more complex to setup and tune than other popular databases like MongoDB or PostgresSQL, which can run on a single server.

When reliability issues come up, firefighting can be stressful if your Elasticsearch setup is buggy or unstable. Your incident could be impacting customers which could negatively impact revenue and your business reputation. Fast remediation steps are important, yet spending a large amount of time researching solutions online during an incident or outage is not a luxury most engineers have. This guide is intended to be a cheat sheet for common issues that engineers running that can cause issues with Elasticsearch and what to look for.

Read on for several helpful tips.

Comments closed