Press "Enter" to skip to content

Day: April 5, 2018

Visualizing Logistic Regression In Action

Sebastian Sauer shows using ggplot2 visuals what happens when there are interaction effects in a logistic regression:

Of course, probabilities greater 1 do not make sense. That’s the reason why we prefer a “bended” graph, such as the s-type ogive in logistic regression. Let’s plot that instead.

First, we need to get the survival probabilities:

d %>% mutate(pred_prob = predict(glm1, type = "response")) -> d

Notice that type = "response gives you the probabilities of survival (ie., of the modeled event).

Read the whole thing.

Comments closed

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

DTUs Or vCores For Azure SQL DB

Denny Cherry looks at a new Azure SQL Database announcement:

Today Microsoft has announced that there is a new way to buy Azure SQL DB. If DTUs aren’t making sense to you, you’ll be happy to know that you can now simply select how many vCores you want for your SQL DB workload.  Now this will still require that you have an understanding on your workload to use this new vCore based way to buy Azure SQL DB, but Cores are a concept that is easy for people to talk about and wrap their heads around. Now this new model is only in preview at the moment, but I’m guessing that it’ll be around for a while in preview, then it’ll go GA as this new model makes sense.

I definitely prefer this model, as IT departments already understand the idea, whereas DTUs were nebulous at best.

Comments closed

Encrypt Those Database Backups

David Fowler shows how easy it is to encrypt database backups in SQL Server:

We all go to great lengths to make sure that our databases are secure (or at least I really hope that we do), we make sure that only authorised users have access and then only to the databases and levels that they need.  We’re careful that all our passwords conform to crazy complexity rules.  We’re a pretty security minded lot aren’t we?

But what about our backups?  Most of the time they’re just sitting on, either a local disk on the server itself or some share somewhere.  Wherever they are, what happens if someone manages to get hold of our backup files?  They can easily just restore them onto a server where they have permissions and voila, all our super secret data is now theirs.

They could even just open the files with a hex editor read the data, it’s really not a difficult thing to do.

By default, SQL Server makes absolutely no effort to encrypt or in any way protect our backups.  That’s pretty bad in my mind.

If you’re on Standard Edition, this became available in 2014.  For Enterprise Edition, it was available earlier.  At this point, I don’t think there are good reasons for not encrypting backups on production systems.

Comments closed

Changing The SQL Server Port On Azure Container Services

Andrew Pruski shows how to change off of the default port for SQL Server when running Azure Container Services:

So, how do you do it when running SQL Server in Azure Container Services?

Well there’s a couple of options available.

The first one is to change the port that SQL is listening on in the container, open that port on the container, and direct to that port from the service.

The second one is to leave SQL Server listening on the default port and direct a non-default port to port 1433 from the service.

Read on to see Andrew try out both of these methods.

Comments closed

Azure Analysis Services Parallelism And Scale

Teo Lachev has a quick note on Azure Analysis Services and parallelism:

Anyone who’s done parallel programming knows that it’s not easy. Safe access and locks need to be used to synchronize access and protect shared resources. The AS xVelocity engine (aka Vertipaq) is a multi-threaded application that accesses RAM as a shared resource. As each thread competes for a global lock for memory allocation/deallocation, scalability decreases. Because processor cores reserve memory in chunks, called cache lines, additional synchronization is required when threads access memory location that are close to each other. Before SQL Server 2016 SP1, this was a serious issue with Tabular. As we’ve learned, starting with SQL Server 2016 SP1, Microsoft switched to using the Intel Threading Building Clocks (TBB) C++ library. Specifically, Tabular now uses the TBB scalable memory allocator. Mind you that TBB is not CPU-specific extensions, so any modern Intel CPU should get these benefits. “Analysis Services SP1 uses an Intel TBB-based scalable allocator that provides separate memory pools for every core. As the number of cores increases, the system can scale almost linearly” and “The Intel TBB-based scalable allocator is also expected to help mitigate performance problems due to heap fragmentation that have been shown to occur with the Windows Heap”. Further, starting with SP1, Tabular is NUMA aware with 4-node NUMA system but I don’t know if the AAS VMs are preconfigured for NUMA.

Teo doesn’t have any firm conclusions at this point, but his initial testing looks positive.

Comments closed

Why CHECKDB Repair Invalidates Replication

Paul Randal explains why running DBCC CHECKDB on a published article will cause subscriptions to become invalidated:

Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:

  • Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
  • Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
  • Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
  • Reevaluate your HA/DR strategy so you don’t have to use repair again in future

One question I’m often asked is about why replication can be broken by running repair.

Read on for the answer.

Comments closed

Troubleshooting Availability Group Performance Dips

Simon Su walks us through a customer scenario where transactions per second would drop several orders of magnitude for a second, and then jump back up to normal:

The “Transaction Delay” value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the “Transaction Delay” counter has the same spikes as the sudden drop of the “Transactions Created/Sec”. Its spikes indicate that at those time points the AG transactions have time delay during commits.  This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting.

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic?

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas.  We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc.  We expect to find something that has the same spike trend as the “Transaction Created/sec” or “Transaction Delay”. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful.

At the endpoint, there’s a reminder that you should keep up to date on patching systems.

Comments closed