What the model actually used for classification were these: ‘posting’, ‘host’, ‘NNTP’, ‘EDU’, ‘have’, ‘there’. These are meaningless artifacts that appear in both the training and test sets and have nothing to do with the topic except that, for example, the word “posting” (part of the email header) appears in 21.6% of the examples in the training set but only two times in the class “Christianity.”
Is this model going to generalize? Absolutely not.
An Example from Image Processing
In this example using Google’s Inception NN on arbitrary images the objective was to correctly classify “tree frogs”. The classifier was correct in about 54% of cases but also interpreted the image as a pool table (7%) and a balloon (5%).
Looks like an interesting paper. Click through for a link to the paper.
After writing about “For Technical Interviews, Don’t Ask Questions, Show Screenshots”, lots of folks asked what kinds of screenshots I’d show. Here’s this week’s example.
I show each screenshot on a projector (or shared desktop) to the candidate and say:
What’s this screen from?
What does the screen mean?
If it was a server you inherited from someone else, would there be any actions you’d take?
What questions might you want to ask before you take those actions?
Would there be any drawbacks to your actions?
What would be the benefits of your actions?
I have started to use this in interviews and I’m already loving it. I don’t want people to memorize minutia (“Name all of the policies available in Policy-Based Management”) but if I show a picture of the different policies, that should jog your memory on when you’ve used PBM to solve interesting problems.
And while it did the trick, I was wanting, for no particular reason, to also have the total number of objects and the percentage. Again, no particular reason. It might be able to be done with a window function, but that is also something that I have limited familiarity with, so I decided to approach it as a CTE. And it works nicely. The objs CTE gives me a count of each object type while the tots CTE gives me the count of all objects. By giving each CTE a column with the value of 1, it’s easy to join them together then calculate a percentage.
That’s one of the nicest things about SQL as a language: you access metadata the same way you access regular data, so that technique can be used to query other data sets as well.
In honesty, I’m just meticulous about installing each new build and immediately digging into the metadata. It would be hard to take a look at
sys.all_objectsand identify what’s new by sight; even columns like
modify_dateare not as accurate as you might expect. (For example, in CTP 3.1,
create_dateof 2015-11-21 18:03:15.267.)
So instead of relying on photographic memory or hoping that something new will jump out at me while scanning the new catalog, I always install the new CTP side-by-side with the previous CTP (or, in the case of the very first CTP, side-by-side with the previous version). Then I can just perform various types of anti-semi-joins across a linked server to see objects and columns that have been added, removed, or changed.