Press "Enter" to skip to content

Category: Administration

A Review of Prometheus Changes at PromCon

B.C. Gain reports on sessions from PromCon EU 2022:

Prometheus’ installations are now in the hundreds of thousands range with millions of users, Richard (RichiH) Hartmann, director of community at Grafana Labs and a CNCF Technical Advisory Group Observability chair, said during his talk “I don’t have to convince this room that Prometheus is a de facto standard in cloud native metric based monitoring.”

But as Prometheus’ maintainers celebrate its 10-year anniversary, the community’s needs for monitoring Kubernetes are evolving quickly. Users are also becoming smarter about what they want and need. PromCon EU 2022, held in Munich in November, the Prometheus annual user’s conference, served as a forum about how and why Prometheus must evolve and what Prometheus maintainers must do.

Prometheus is a critical part of the modern service monitoring stack; read on to learn more about histogram updates and work at the core which should help Prometheus users along the way.

Comments closed

Alert Setup with Azure Monitor

Sunil Verma sounds the alarm:

For this instance, we will setup an alert and action to determine and send out a notification when a Virtual machine has been stopped and also could be restarted whenever such conditions has met.

1. Firstly, Go to search pane on the Azure portal search monitor, click on alert inside monitor and create an alert rule. Further, specify a scope for what you want to setup alert. On this occasion, I am setting it for virtual machine.

Read on to learn more about what Azure Monitor does, as well as the steps to set up an alert and an action.

Comments closed

Moving Stack Overflow to Azure

Aaron Bertrand gets into the whats and wherefores:

Like many companies, Stack Overflow is trying to get out of the business of running our architecture in our own data centers; instead, we want to offload some of the more mundane parts of system administration to a cloud service offering like Azure.

I’m going to cut to the chase for the purpose of this article and concede we’ve already decided on Azure for the majority of our infrastructure and, most importantly to me, our databases.

Click through to learn what their plan is and why Aaron & co went that particular route.

Comments closed

Troubleshooting Memory Constraints in SQL Server

Ajay Dwivedi is running low on memory:

Memory Pressure on SQL Server is very common and is presented as Memory Grant Pending metric in the above dashboard screenshot. In my previous blog Troubleshooting SQL Server With High Memory Grants, I discussed what actions should be taken in order to get the server out of constant memory grant issues in real-time.

For long-term resolution, there are various metrics to be validated before making any decision.

Read on for several tips to help you decide the best course of action.

Comments closed

From SQL Server to Cassandra

Lewis DiFelice shares some lessons learned:

The first 6 months were rough. The cluster had been in operation for more than 6 months but was not doing too well. Performance was poor and, worse, it frequently crashed. It was not a fun time. But eventually, the problems got fixed. 

There were several issues (including my inexperience) that caused these problems, but the core one was that the original developer had treated it like another relational database. 

Read on for a few tips to make learning (and managing) Apache Cassandra a little easier.

Comments closed

Breaking the World with auto_explain

Ryan Lambert gets a lot of explanation:

Postgres has a handy module called auto_explain. The auto_explain module lives up to its name: it runs EXPLAIN automatically for you. The intent for this module is to automatically provide information useful for troubleshooting about your slow queries as they happen. This post outlines a pitfall I recently discovered with auto_explain. Luckily for us, it’s an easy thing to avoid.

I discovered this by running CREATE EXTENSION postgis; and watching it run for quite a while before failing with an out of disk space error. That is not my typical experience with a simple CREATE EXTENSION command!

Read on to learn what happened and how you can prevent making a similar mistake.

Comments closed

Checking if Cross-Database Ownership Chaining is On

Tom Collins performs a check:

Cross-database ownership chaining is a SQL Server  security feature allowing database users  access to other database objects hosted on the same SQL server instance, in the case where database  users don’t have access granted explicitly

Tom shows us whether it is on as well as how to enable it. I’d recommend not enabling it at all and using module signing instead.

Comments closed

Identifying Rows in sp_wait_for_database_copy)_sync

Jose Manuel Jurado Diaz troubleshoots a problem:

As you could see in this public documentation Auto-failover groups overview & best practices – Azure SQL Database | Microsoft Learn about sp_wait_for_database_copy_sync “sp_wait_for_database_copy_sync prevents data loss after geo-failover for specific transactions, but does not guarantee full synchronization for read access. The delay caused by a sp_wait_for_database_copy_sync procedure call can be significant and depends on the size of the not yet transmitted transaction log on the primary at the time of the call.”

Our customer asked about several scenarios to understand this behaviour and also, verify if there is possible to identify the rows that have not been synced. For this, I developed a POC to test it:

Read on to see what you’d need to do.

Comments closed

Finding “Stolen” CPU on SQL Server

Ajay Dwiveldi has a public service announcement for us:

The above panel is showing any SQLServer that needs DBA attention. But there is this server highlighted in the above screenshot showing a significant gap b/w OS CPU and SQL CPU. Any gap of over 15-20% b/w OS & SQL CPU consumption is something that should be investigated. This was an indicator that there is some other non sqlservr process that is consuming high CPU on the host.

Friends don’t let friends run anything other than SQL Server on a SQL Server machine.

Comments closed

Disabling Inline Database Log Flushes

Bob Dorr talks about a trace flag:

SQL Server optimizes database log file flush requests, performing these flush requests inline on the active worker. Certain patterns of log record activity may encounter increased spinlock contention while performing the log cache block flush activities.

Trace flag -T8904 (startup only trace flag) disables inline log flush, limiting the contention possibility from many workers to the subset of background LogWriter workers.  When the trace flag is enabled, the worker adding log records mark the log cache block to be flushed and a background LogWriter worker performs the flush activity.

Read on to learn more, including a call-out to Curated SQL favorite Lonny Niederstadt.

Comments closed