Press "Enter" to skip to content

Category: Administration

Too Much NUMA

Denny Cherry discusses the performance downsides of having too many NUMA nodes on a server:

At one client I was working with in 2015 they had a server which was configured very similarly to what I’ve described above. They had 32 cores, with 11 NUMA nodes. One NUMA node has 4 cores, the next 9 had three cores each, while the 11th NUMA node had one core in it.

When monitoring the performance on the server what we saw was that three cores were running at 100% (or close to it) while the other cores were running at about 4%.

The reason for this, is that SQL Server is NUMA aware, and it was trying to get the best performance possible out of the configuration that it had access to.

Moderation in everything.  Also, it’s important to plan growth and check every once in a while for oddities like this.

Comments closed

Moving SSRS Reports

Andy Mallon fills a need in the community:

We’re just interacting with the SSRS Web Service. I’m using PowerShell, but you could also write a little .NET app to do something similar.

I’m not using any magical SSRS-specific cmdlets. I’m using PowerShell to interact with a web service….that web service just happens to be SSRS.

If you’re still trying to get along with RSScripter (if you can even find a copy any longer), Andy’s code might help you out.

Comments closed

Polybase DMVs

I look at the DMVs associated with Polybase and external table creation:

Let’s walk through this one step at a time and understand what the DMV is telling us.  Unfortunately, the DMV documentation is a little sparse, so some of this is guesswork on my part.

  1. A RandomIDOperation appears to create a temporary table.  In this case, the table (whose name is randomly generated) is named TEMP_ID_53.  I’m not sure where that name comes from; the session I ran this from was 54, so it wasn’t a session ID.

  2. After the table gets created, each Compute node gets told to create a table called TMP_ID_53 in tempdb whose structure matches our external table’s structure.  One thing you can’t see from the screenshot is that this table is created with DATA_COMPRESSION = PAGE.  I have to wonder if that’d be the same if my Compute node were on Standard edition.

  3. We add an extended property on the table, flagging it as IS_EXTERNAL_STREAMING_TABLE.

  4. We then update the statistics on that temp table based on expected values.  629 rows are expected here.

  5. Then, we create the dest stat, meaning that the temp table now has exactly the same statistics as our external table.

  6. The next step is that the Head node begins a MultiStreamOperation, which tells the Compute nodes to begin working.  This operator does not show up in the documentation, but we can see that the elapsed time is 58.8 seconds, which is just about as long as my query took.  My guess is that this is where the Head node passes code to the Compute nodes and tells them what to do.

  7. We have a HadoopRoundRobinOperation on DMS, which stands for “Data Movement Step” according to the location_type documentation.  What’s interesting is that according to the DMV, that operation is still going.  Even after I checked it 40 minutes later, it still claimed to be running.  If you check the full query, it’s basically a SELECT * from our external table.

  8. Next is a StreamingReturnOperation, which includes our predicate WHERE dest = ‘ORD’ in it.  This is a Data Movement Step and includes all of the Compute nodes (all one of them, that is) sending data back to the Head node so that I can see the results.

  9. Finally, we drop TEMP_ID_53 because we’re done with the table.

This post was about 70% legwork and 30% guesswork.  That’s a bit higher a percentage than I’d ideally like, but there isn’t that much information readily available yet, so I’m trying (in my own small way) to fix that.

Comments closed

Stored Procedure Last Run Times

Richie Lee has a script to see when stored procedures were last executed:

Quick script to get the last time a stored procedure was executed in the database. The reason for the seemingly over-engineered script is that different query plans can be generated, meaning that stored procedures can appear more than once in the list.

The query doesn’t quite work as-is, but making qs.execution_count into an aggregation and removing it from the GROUP BY would work.  I’d probably rewrite it to look a bit more like:

WITH querystats AS
(
	SELECT
		OBJECT_NAME(qt.objectid) AS ProcedureName,
		SUM(qs.execution_count) OVER (PARTITION BY OBJECT_NAME(qt.objectid)) AS ExecutionCount,
		qs.creation_time AS CreationTime,
		ROW_NUMBER() OVER (PARTITION BY OBJECT_NAME(qt.objectid) ORDER BY creation_time DESC) AS rownum
	FROM sys.dm_exec_query_stats AS qs
	CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt
	WHERE
		qt.[dbid] = DB_ID()
		AND OBJECT_NAME(qt.objectid) LIKE '%%'
)
SELECT
	qs.ProcedureName,
	qs.ExecutionCount,
	qs.ExecutionCount
FROM querystats qs
WHERE
	qs.rownum = 1;
Comments closed

Startup Stored Procedures

Dave Mason looks at using sp_procoption to execute stored procedures at startup:

If you have more than one task you want to run at startup, you could include code for each task in a single stored procedure. But that’s generally bad coding practice. Go with the modular approach and create a separate stored procedure for each distinct task. With sp_procoption, you can set more than one stored procedure for automatic execution. The MSDN documentation was not clear regarding the order of execution, though. I thought there might be something similar to sp_settriggerorder, but I wasn’t able to find anything like that. I ran a test on SQL 2014 with 3 stored procedures set for automatic execution.

Dave has interesting notes on procedure run order, where these procedures need to live, and even some ideas on what you might put into startup stored procedures.

Comments closed

Sensible Auto-Growth Settings

Ajay Jagannathan notes that SQL Server 2016’s database auto-growth has changed to better default values:

model database: New default data and log file size is 8MB and default auto-growth is 64MB. This ensures that any new database created without explicitly specifying the SIZE/FILEGROWTH parameter will have 8MB initial size for all data and log files and 64MB for auto-growth for both data and log files.

For data files, having a 64MB autogrow, aligns with 1 PFS interval (which covers a range of 8088 pages = 64MB). For log files, having a 64MB autogrow helps with sizing the initial VLFs correctly so that they can be garbage claimed (wrapped-around) without which the log can keep growing.

This is much better than the prior default of 1 MB size and 10% auto-growth.  Percentage growth leads to eventual pain.

Comments closed

SQL Server Event Handling

Dave Mason looks at different levels of event handling within SQL Server:

While event handling for .Net developers is implemented in a unified way, this is not the case for SQL Server. Event handling for SQL Server lacks the “one stop shopping” afforded to .Net developers. *If* we had access to the code base for SQL Server and wanted to handle a specific event, we could add our own code, recompile sqlservr.exe, and be on our way. But since we don’t have this ability, we use SQL Server’s run-time hooks. Consider the following:

  • DDL Triggers: handles Data Definition Language events (synchronously).

  • Event Notifications: handles a wide swath of SQL Server events via Service Broker (asynchronously).

  • SQL Alerts: handles the following events:

    1. Events with a specific error number or severity level that are written to the Windows Event Log.

    2. Events for a specific performance condition.

    3. WMI events.

  • sp_procoption: handles the startup event by specifying a stored procedure to run when the database engine service starts.

  • SQL Agent jobs: handles time-based events defined by user-specified job schedules (ie daily, hourly).

This sounds like the beginning of a new series.

Comments closed

Spark Metrics

Swaroop Ramachandra looks at some key metrics for Spark administration:

Once you have identified and broken down the Spark and associated infrastructure and application components you want to monitor, you need to understand the metrics that you should really care about that affects the performance of your application as well as your infrastructure. Let’s dig deeper into some of the things you should care about monitoring.

  1. In Spark, it is well known that Memory related issues are typical if you haven’t paid attention to the memory usage when building your application. Make sure you track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Garbage collection stalls or abnormality in patterns can increase back pressure.

There are a few metrics of note here.  Check it out.

Comments closed

Preparing For Disaster Recovery

Kendra Little has a 30-minute video and explanation for how to prepare for a failover event:

The fact that you’re thinking about this is great!

You’re right, there are two major types of fail-overs that you have to think about:

  • Planned failover, when you can get to the original production system (at least for a short time)
  • Unplanned failover, when you cannot get to it

Even when you’re doing a planned failover, you don’t have time to go in and script out settings and jobs and logins and all that stuff.

Timing is of the essence, so you need minimal manual actions.

And you really should have documentation so that whomever is on call can perform the failover, even if they aren’t you.

The short answer is, test, test, test.  Test where it can’t hurt, and then test where it can.  But do read/watch the whole thing.

Comments closed

Monitoring Apache Spark

Swaroop Ramachandra has started a series on monitoring Apache Spark:

Spark provides metrics for each of the above components through different endpoints. For example, if you want to look at the Spark driver details, you need to know the exact URL, which keeps changing over time–Spark keeps you guessing on the URL. The typical problem is when you start your driver in cluster mode. How do you detect on which worker node the driver was started? Once there, how do you identify the port on which the Spark driver exposes its UI? This seems to be a common annoying issue for most developers and DevOps professionals who are managing Spark clusters. In fact, most end up running their driver in client mode as a workaround, so they have a fixed URL endpoint to look at. However, this is being done at the cost of losing failover protection for the driver. Your monitoring solution should be automatically able to figure out where the driver for your application is running, find out the port for the application and automatically configure itself to start collecting metrics.

For a dynamic infrastructure like Spark, your cluster can get resized on the fly. You must ensure your newly spawned components (Workers, executors) are automatically configured for monitoring. There is no room for manual intervention here. You shouldn’t miss out monitoring newer processes that show up on the cluster. On the other hand, you shouldn’t be generating false alerts when executors get moved around. A general monitoring solution will typically start alerting you if an executor gets killed and starts up on a new worker–this is because generic monitoring solutions just monitor your port to check if it’s up or down. With a real time streaming system like Spark, the core idea is that things can move around all the time.

Spark does add a bit of complexity to monitoring, but there are solutions in place.  Read the whole thing.

Comments closed