Robust Regressions in R

Michael Grogan shows how you can find and re-weigh outliers when performing regressions:

A useful way of dealing with outliers is by running a robust regression, or a regression that adjusts the weights assigned to each observation in order to reduce the skew resulting from the outliers.

In this particular example, we will build a regression to analyse internet usage in megabytes across different observations. You will see that we have several outliers in this dataset. Specifically, we have three incidences where internet consumption is vastly higher than other observations in the dataset.

Let’s see how we can use a robust regression to mitigate for these outliers.

Click through for a demonstration.

MLB Run Scoring Trends Shiny App

Martin Monkman has an update to the MLB run scoring app in Shiny, just in time for spring training:

2. feather instead of csv
The app relied on some pre-wrangled csv files; these have been replaced by files stored using the .feather format, which makes for a signficant performance improvement.

Martin has made a significant number of changes and it’s cool to see the full list of changes. H/T R-bloggers

Disable Priority Boost

Randolph West wants you to disable priority boost if you have it turned on:

It turns out that on one of the benchmarks, Microsoft was able to achieve higher throughput by setting the SQL Server process to HIGH_PRIORITY_CLASS, and the thread priority level over and above that to THREAD_PRIORITY_HIGHEST. While it was extremely helpful to beat artificial benchmarks, it came at the cost of giving SQL Server higher execution context on the CPU than almost every other process on Windows.

And that’s the only good use of priority boost ever. If you’ve never heard of priority boost, just keep on ignoring it.

Storing SQL Server Helm Charts in GitHub

Andrew Pruski shows how we can use GitHub to store Helm charts and access them easily:

In a previous post I ran through how to create a custom SQL Server Helm chart.

Now that the chart has been created, we need somewhere to store it.

We could keep it locally but what if we wanted to use our own Helm chart repository? That way we wouldn’t have to worry about deleting the chart on our local machine.

I use Github to store all my code to guard against accidentally deleting it (I’ve done that more than once) so why not use Github to store my Helm charts?

Cluster configurations are still code, and code belongs in source control.

Automating Azure Storage To Move Between Tiers

Ryan Adams built a process to save money on storage costs for a customer’s test environment:

One of the best things about Azure, and the cloud in general, is we can automate most anything, and we are going to look at how to automate Azure VM Storage.  This allows us to come up with some outside-of-the-box solutions.  I had a customer with a road block that we were able to work around by automating some things with their Azure Virtual Machines.

Their challenge was that they wanted to move their test and development environments to Azure, but the storage cost was prohibitive.  They needed premium storage to mimic their production environment, but it was not financially viable for test and development so they were going to keep it all on premises.  During our conversations I learned that they only test between 8am and 5pm, Monday through Friday.  My suggestion was that we put their databases on cheaper storage during off times and only premium when they are actively using it.

This doesn’t look like a one-hour task but if you’re in need of some cost savings on storage in non-production environments, check out Ryan’s scripts.

Microsoft Data Platform Bug Reporting Links

Kevin Feasel

2019-02-28

Bugs

Brent Ozar has put together a compendium of where you should go if you want to file bug reports or feature requests for different products in the Microsoft data platform space:

Azure Data Studio – open an issue in the Github repo. While you open an issue, Github helps by searching the existing issues as you’re typing, so you’ll find out if there’s already a similar existing issue.

Click through for all of the links. I personally just yell skyward in the hopes that they hear me and fix my problems. It doesn’t work very often so I don’t recommend it as a strategy.

Issues From Using gMSA Accounts with Docker

Michal Poreba shares some lessons from trying to set up Docker and SQL Server to use gMSA accounts:

While in the end I was able to make it work on Windows Server 2016, 1803, 2019 and 1809 I wasted some time trying to make it work with docker 17.06. Unsuccessfully. Docker 18.09.1 and 18.09.2 worked every time.
There are some reports of intermittent problems with specific OS updates breaking stuff, like the one here but I wasn’t able to reproduce it. I wonder if the updates changes something else that it causing problems, in other words is it the problem with the update itself or the update process?

Read on for several helpful tips, as well as dead ends to avoid.

Script Update Mode Should Be Parallel

Andy Levy explains why he wants script update mode to run in parallel:

When you have 8000+ databases on an instance, this is a huge deal. You’re looking at over two and a half hours just to bring SQL Server online after installing an SP or CU. While the instance is in script update mode, incoming connections are locked down and the service remains in the Starting status. Only the Dedicated Administrator Connection can be used to connect to the instance remotely.

Taking advantage of having a Failover Cluster Instance to patch the passive node in advance is great for minimizing downtime for Windows updates. But whether you have an FCI or not, SQL Server will remain in the “Starting” state until all of your databases have been through this process after installing an update. What was once a 10-minute failover is now a multi-hour ordeal, and maintenance windows become a lot harder to negotiate.

Andy’s pretty far over on the right-hand side of that Bell curve, but I like his SQL Server suggestion because even with just a few hundred or a couple thousand databases, you’re still talking real time savings.

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728