Building an Image Classifier with PyTorch

Rogier van der Geer shows how you can use PyTorch to build out a Convolutional Neural Network for image classification:

The tool that we are going to use to make a classifier is called a convolutional neural network, or CNN. You can find a great explanation of what these are right here on wikipedia.

But we are not going to fully train one ourselves: that would take way more time than I would be willing to spend. Instead, we are going to do transfer learning, where we take a pre-trained CNN and replace only the last layer by a layer of our own. Then we only need to train that single layer, as all the other layers already have weights that are quite sensible. Here we exploit the fact that the images we are interested in have a lot of the same properties as those images that the original network was trained on. You can find a great explanation of transfer learning here.

Read on for a detailed example.

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost:

While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).
In doing that I ran into one more avoidable but strange issue in using xgboost: when run for a small number of rounds it at first appears that xgboost doesn’t get the unconditional average or grand average right (let alone the conditional averages Nina was working with)!

It’s not something you’ll hit very often, but if you’re trying xgboost against a small enough data set with few enough rounds, it is something to keep in mind.

Using AZCopy for SQL Backups

John McCormack shows how you can use AZCopy to move SQL Server backups into Azure Storage:

AZCopy is a useful command line utility for automating the copying of files and folders to Azure Storage Account containers. Specifically, I use AZCopy for SQL Backups but you can use AZCopy for copying most types of files to and from Azure.

In this blog post example (which mirrors a real world requirement I had), the situation is that whilst I need to write my SQL backups over a network share, I also want to push them up to Azure Storage (in a different region) to allow developers quicker downloads/restores. This is why I need to use AZCopy. If I only needed my backups to be written to Azure, I could have used BACKUP TO URL instead.

Read on to see how John did it.

Changes to Azure SQL Database SLA

Arun Sirpal notes a change to the Azure SQL Database Service Level Agreement:

I am sure many missed the updates to Azure SQL Database SLA (Service Level Agreement). It used to be 99.99% across all tiers  but split between two different high-availability architectural models. Basic, Standard and General Purpose tiers had its own model and the Premium / Business Critical tiers had a different one.

Read on to see the change.

Database Health Monitor 2.8.2

Steve Stedman has a new version of the Database Health Monitor:

Version 2.8.2 is the July 2019 Release of Database Health Monitor. There was a 2.8 release on July 17, 2019, followed shortly by a 2.8.1 and 2.8.2 release after a report of a bug in the blocking queries report that was fixed.

Read on to see what’s new.

Disk Utilization Per Drive in SQL Server

Max Vernon has a script which shows more than what xp_fixeddrives has to offer:

However, the command output doesn’t include the total size of each drive, making it impossible to determine the percent free space. If you’re in an environment where a separate team monitors disk space, and has alerts set when free space falls below a certain percentage, you may want to ensure you don’t breach those levels. The following script provides “the big picture” for your servers, since it provides total size, free space, available space, and the percent free. It does require the use of the documented and supported sys.xp_cmdshell system extended stored procedure. The code uses the drive letters returned by sys.xp_fixeddrives inside a cursor. Inside the cursor, we call the dos command fsutil volume diskfree C: to get total capacity and free space, etc:

Click through for the script.

Bar Chart Presentation Options

Andy Kirk gives us five techniques for gussying up bar charts:

“Bar charts are boring”, say many people. “How can we make them more attractive”, say many desperate clients. Bar charts are ubiquitous because they are the reliable and trusted lieutenants often relied upon to show the always-common quantitative comparisons across different categories. Their frequent use can induce ‘boredom’ through the familiar but, in particular, accusations of inelegance can be raised with the default tool styles many creators lean on.

The five charts below just offer some different ways you might style them, through variations in the use of functional colour properties, chart apparatus and layout decisions, in particular. The reasons why you would choose to use any of these methods are varied and especially contextually dependent, based on matters like space to work in, range of quantitative values, size of category labels, number of bars, importance of precise readability. The charts are all showing the all-time top 10 most streamed songs on Spotify, as of July 2019, with data from wikipedia.

Read on for the five options.

Powershell Modules Everywhere

Kevin Chant has some thoughts on whether you should have SQL Server Powershell modules on every server:

Recently I’ve seen recommendations about putting PowerShell modules on every SQL Server. I must admit it has got me thinking if this is indeed worthwhile.

In addition, it makes me wonder if it’s actually better to put the Powershell modules on a select number of management servers instead?

If you are wondering which modules I could be talking about, I mention some in a previous post which you can read in detail here.

Read on for Kevin’s thoughts on the matter.

Reinforcement Learning with R

Holger von Jouanne-Diedrich takes us through concepts in reinforcement learning:

At the core this can be stated as the problem a gambler has who wants to play a one-armed bandit: if there are several machines with different winning probabilities (a so-called multi-armed bandit problem) the question the gambler faces is: which machine to play? He could “exploit” one machine or “explore” different machines. So what is the best strategy given a limited amount of time… and money?

There are two extreme cases: no exploration, i.e. playing only one randomly chosen bandit, or no exploitation, i.e. playing all bandits randomly – so obviously we need some middle ground between those two extremes. We have to start with one randomly chosen bandit, try different ones after that and compare the results. So in the simplest case the first variable e=0.1 is the probability rate with which to switch to a random bandit – or to stick with the best bandit found so far.

Click through for various cases and a pathfinding example in R. H/T R-Bloggers

Table Variables and Parallelism

Erik Darling shows off a neat trick for inserting with parallelism into a table variable:

One of the many current downsides of @table variables is that modifying them inhibits parallelism, which is a problem #temp tables don’t have.

While updating and deleting from @table variables is fairly rare (I’ve seen it, but not too often), you at minimum need an insert to put some data in there.

No matter how big, bad, ugly, or costly your insert statement is, SQL Server can’t parallelize it.

That’s just what he wants you to think and then the trap goes off.

Categories

July 2019
MTWTFSS
« Jun  
1234567
891011121314
15161718192021
22232425262728
293031