Press "Enter" to skip to content

Author: Kevin Feasel

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost:

While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).
In doing that I ran into one more avoidable but strange issue in using xgboost: when run for a small number of rounds it at first appears that xgboost doesn’t get the unconditional average or grand average right (let alone the conditional averages Nina was working with)!

It’s not something you’ll hit very often, but if you’re trying xgboost against a small enough data set with few enough rounds, it is something to keep in mind.

Comments closed

Using AZCopy for SQL Backups

John McCormack shows how you can use AZCopy to move SQL Server backups into Azure Storage:

AZCopy is a useful command line utility for automating the copying of files and folders to Azure Storage Account containers. Specifically, I use AZCopy for SQL Backups but you can use AZCopy for copying most types of files to and from Azure.

In this blog post example (which mirrors a real world requirement I had), the situation is that whilst I need to write my SQL backups over a network share, I also want to push them up to Azure Storage (in a different region) to allow developers quicker downloads/restores. This is why I need to use AZCopy. If I only needed my backups to be written to Azure, I could have used BACKUP TO URL instead.

Read on to see how John did it.

Comments closed

Changes to Azure SQL Database SLA

Arun Sirpal notes a change to the Azure SQL Database Service Level Agreement:

I am sure many missed the updates to Azure SQL Database SLA (Service Level Agreement). It used to be 99.99% across all tiers  but split between two different high-availability architectural models. Basic, Standard and General Purpose tiers had its own model and the Premium / Business Critical tiers had a different one.

Read on to see the change.

Comments closed

Disk Utilization Per Drive in SQL Server

Max Vernon has a script which shows more than what xp_fixeddrives has to offer:

However, the command output doesn’t include the total size of each drive, making it impossible to determine the percent free space. If you’re in an environment where a separate team monitors disk space, and has alerts set when free space falls below a certain percentage, you may want to ensure you don’t breach those levels. The following script provides “the big picture” for your servers, since it provides total size, free space, available space, and the percent free. It does require the use of the documented and supported sys.xp_cmdshell system extended stored procedure. The code uses the drive letters returned by sys.xp_fixeddrives inside a cursor. Inside the cursor, we call the dos command fsutil volume diskfree C: to get total capacity and free space, etc:

Click through for the script.

Comments closed

Bar Chart Presentation Options

Andy Kirk gives us five techniques for gussying up bar charts:

“Bar charts are boring”, say many people. “How can we make them more attractive”, say many desperate clients. Bar charts are ubiquitous because they are the reliable and trusted lieutenants often relied upon to show the always-common quantitative comparisons across different categories. Their frequent use can induce ‘boredom’ through the familiar but, in particular, accusations of inelegance can be raised with the default tool styles many creators lean on.

The five charts below just offer some different ways you might style them, through variations in the use of functional colour properties, chart apparatus and layout decisions, in particular. The reasons why you would choose to use any of these methods are varied and especially contextually dependent, based on matters like space to work in, range of quantitative values, size of category labels, number of bars, importance of precise readability. The charts are all showing the all-time top 10 most streamed songs on Spotify, as of July 2019, with data from wikipedia.

Read on for the five options.

Comments closed

Powershell Modules Everywhere

Kevin Chant has some thoughts on whether you should have SQL Server Powershell modules on every server:

Recently I’ve seen recommendations about putting PowerShell modules on every SQL Server. I must admit it has got me thinking if this is indeed worthwhile.

In addition, it makes me wonder if it’s actually better to put the Powershell modules on a select number of management servers instead?

If you are wondering which modules I could be talking about, I mention some in a previous post which you can read in detail here.

Read on for Kevin’s thoughts on the matter.

Comments closed

Reinforcement Learning with R

Holger von Jouanne-Diedrich takes us through concepts in reinforcement learning:

At the core this can be stated as the problem a gambler has who wants to play a one-armed bandit: if there are several machines with different winning probabilities (a so-called multi-armed bandit problem) the question the gambler faces is: which machine to play? He could “exploit” one machine or “explore” different machines. So what is the best strategy given a limited amount of time… and money?

There are two extreme cases: no exploration, i.e. playing only one randomly chosen bandit, or no exploitation, i.e. playing all bandits randomly – so obviously we need some middle ground between those two extremes. We have to start with one randomly chosen bandit, try different ones after that and compare the results. So in the simplest case the first variable e=0.1 is the probability rate with which to switch to a random bandit – or to stick with the best bandit found so far.

Click through for various cases and a pathfinding example in R. H/T R-Bloggers

Comments closed

Table Variables and Parallelism

Erik Darling shows off a neat trick for inserting with parallelism into a table variable:

One of the many current downsides of @table variables is that modifying them inhibits parallelism, which is a problem #temp tables don’t have.

While updating and deleting from @table variables is fairly rare (I’ve seen it, but not too often), you at minimum need an insert to put some data in there.

No matter how big, bad, ugly, or costly your insert statement is, SQL Server can’t parallelize it.

That’s just what he wants you to think and then the trap goes off.

Comments closed

Changing Query Store Report Interval

Arun Sirpal wants to change the report interval for a Query Store report:

While not specific to SQL Server 2019 (I was using this version to do some testing) I was struggling to find how to change the time period of analysis for the Query Store reports within SSMS.

This is not a ground breaking post but hopefully a helpful one! So, I load up the “Top 25 resource consumers” report and by default it will show data for the past hour. So what do you do, or should I say what do you click to change the time interval for the report?

Read on for the two screenshots which answer this question for you.

Comments closed