There are four major resources: memory, compute (CPU), disk, and network. Memory and compute are by far the most expensive. Understanding how much compute and memory your application requires is crucial for optimization.
You can configure how much memory and how many CPUs each executor gets. While the number of CPUs for each task is fixed, executor memory is shared between the tasks processed by a single executor.
A few key parameters provide the most impact on how Spark is executed in terms of resources: spark.executor.memory, spark.executor.cores, spark.task.cpus, spark.executor.instances, and spark.qubole.max.executors.
This article gives us some idea of the levers we have available as well as when to pull them. Though the article itself is vendor-specific, a lot of the advice is general.
Journalists latched onto Google’s NN 0.95 score vs. the comparison 0.86 (see EWS Strawman below), as the accuracy of determining mortality. However the actual metric the researchers used is AUROC (Area Under Receiver Operating Characteristic Curve) and not a measure of predictive accuracy that indexes the difference between the predicted vs. actual like RMSE (Root Mean Squared Error) or MAPE (Mean Absolute Percentage Error). Some articles even erroneously try to explain the 0.95 as the odds ratio.
Just as the concept of significance has different meanings to statisticians and laypersons, AUROC as a measure of model accuracy does not mean the probability of Google’s NN predicting mortality accurately as journalists/laypersons have taken it to mean. The ROC (see sample above) is a plot of a model’s False Positive Rate (i.e. predicting mortality where there is none) vs. the True Positive Rate (i.e. correctly predicting mortality). A larger area under the curve (AUROC) means the model produces less False Positives, not the certainty of mortality as journalists erroneously suggest.
The researchers themselves made no claim to soothsayer abilities, what they said in the paper was:
… (their) deep learning model would fire half the number of alerts of a traditional predictive model, resulting in many fewer false positives.
It’s an interesting article and a reminder of the importance of terminological precision (something I personally am not particularly good at).
Wading through all of the SQL Server memory-related perfmon counters to understand how they related to each other took me a really long time. Time-series graphs that show the relationship help me tremendously, and when I started trying to account for SQL Server memory years ago I couldn’t find any. So I started to blog some time-series graphs, under the theory that either my understanding was correct and my graphs would be helpful to someone… or they’d be wrong and someone would correct me.
Well… its been about 5 years and my graphs haven’t generated too much discussion, but they’ve really helped me 😀😀😀
Perfmon: SQL Server Database pages + Stolen pages + Free pages = Total pages
Working with SQL Server 2016 and some demanding ColumnStore batch mode workloads, I began to see suspicious numbers, and graphs that didn’t make sense to me. Today I got pretty close to figuring it out so I wanted to share what I’ve learned.
The following graphs are from a 4×10 physical server running Windows and SQL Server. Four sockets, 4 NUMA nodes.
For bonus points, Lonny traces down a problem where expectations aren’t meeting reality.
TL;DR – Update to the latest CU, create multiple tempdb files, if you’re on SQL 2014 or earlier enable TF 1117 and 1118, if you’re on SQL 2016 enable TF 3427.
And now it’s time for everyone’s favorite SQL Server topic – tempdb! In this article, I’d like to cover some recent changes that you may not be aware of that can help alleviate some common performance issues for systems that have a very heavy tempdb workload. We’re going to cover three different scenarios here:
There’s some good information in here so don’t just say tl;dr.
Update operations in SQL Server are not fully logged in the transaction log. Full before-and-after values, unfortunately don’t exist, only the delta of the change for that record. For example, SQL Server may show a change from “H” to “M” when the actual record that was changed was from “House” to “Mouse”. To piece together the full picture a process must be devised to manually reconstruct the history of changes, including the state of the record prior to update. This requires painstakingly re-constructing every record from the original insert to the final update, and everything in between.
BLOBs are another challenge when trying to use fn_dblog to read transaction history. BLOBs, when deleted, are never inserted into the transaction log. So examining the transaction log won’t provide information about its existence unless the original insert can be located. But only by combining these two pieces of data will you be able to recover a deleted BLOB file. This obviously requires that the original insert exists in the active/online portion of the transaction log, the only part accessible to fn_dblog. This may be problematic if the original insert was done some weeks, months or years earlier and the transaction log has been subsequently backed up or truncated
I’ve tried to avoid messing directly with the transaction log whenever possible, but there are scenarios where it’s the only place you have needed information.
SQL Server Management Studio only shows you the first missing index recommendation in a plan.
Not the best one. Not all of them. Just whichever one happens to show up first.
Using the public Stack Overflow database, I’ll run a simple query:
But that behavior isn’t the case for all tools; SQL Operations Studio is a bit different.
I published tables with FILESTREAM data before, but it seems like there is a particular planetary alignment that triggers an error during the execution of the snapshot agent.
This unlikely combination consists in a merge article with a FILESTREAM column and two UNIQUE indexes on the ROWGUIDCOL column. Yes, I know that generally it does not make sense to have two indexes on the same column, but this happened to be one of the cases where it did, so we had a CLUSTERED PRIMARY KEY on the uniqueidentifier column decorated with the ROWGUIDCOL attribute and, on top, one more NONCLUSTERED UNIQUE index on the same column, backed by a UNIQUE constraint.
Setting up the publication does not throw any error, but generating the initial snapshot for the publication does:
This is a rather specific confluence of events, so it probably won’t affect many people. Still, it is a bug.