Missing Costs

Kenneth Fisher has a post about missing important costs:

Growing up my mother used this phrase quite a bit. Penny wise, pound stupid. (In case you didn’t know the pound is the British equivalent of the dollar.) Basically, it means paying attention to the small stuff at the expense of the big stuff. My favorite example of this was a few jobs ago. Without any need to go into details, our coffee area was stocked with both 8oz and 16oz styrofoam cups. One day, one of the accountants decided that there was no point to the extra cups and got rid of the more expensive 16oz cups. Not really a big deal, but our morale was already disastrously low, so it had more impact that it might have otherwise. The most interesting part, though, was a memo that a co-worker sent out. Unfortunately, I don’t have it anymore so I’m going to have to do my best to re-create it.

The lesson is to think through the ramifications of decisions, as there tend to be unintended consequences due to an incomplete understanding of costs.

Dynamic DNS And Powershell

Drew Furgiuele explains dynamic DNS and shows how to use Powershell to keep your Google Dynamic DNS record up to date:

Google’s Dynamic DNS API is really little less than a HTTPS POST to a specific URL with some parameters packed into the post. What makes it work, though, is for each dynamic DNS host name you provide, you get an auto-generated user name and password that needs to be part of the POST. PowerShell is really naturally suited for this type of automation, thanks to Invoke-WebRequest (MSDN Link). It’s built for this, and even supports HTTPS. Heck, it even supports the implementation of username:password in the request via a PSCredential object.

All I really needed to do was to wrap the code in a function and add some parameters that can be passed in, invoke the web request, and parse the results.

Click through for more information, including a couple Powershell scripts.

Market Basket Analysis Basics

Kevin Feasel

2017-02-21

R

Leila Etaati has an introduction to market basket analysis with R:

For instance, imagine we have below transaction items from a shopping store  for last hours,

Customer 1: Salt, pepper, Blue cheese

Customer 2: Blue Cheese, Pasta, Pepper, tomato sauce

Customer 3: Salt, Blue Cheese, Pepperoni, Bacon, egg

Customer 4: water, Pepper, Egg, Salt

we want to know how many times customer purchase pepper and salt together
the support will be : from out four main transactions (4 customers), 2 of them purchased salt and pepper together. so the support will be 2 divided by 4 (all number of transaction.

Basket analysis is one way of building a recommendation engine:  if you’re buying butter, cream, and eggs, do you also want to buy sugar?

SQL Slammer Is Still A Thing

Thomas LaRock notes that SQL Slammer is still out there:

But all of that is in the past. Here’s what you need to know about SQL Slammer today.

First, this worm infects unpatched SQL 2000 and MSDE instances only. About a month ago, I would have thought that the number of such installs would be quite small. But the recent uptick in Slammer tells me that there are enough of these systems to make Slammer one of the top malware detected at the end of 2016. And a quick search at Shodan shows thousands of public-facing database servers available. And if you want to have some real fun at Shodan®, Ian Trump (blog@phat_hobbit) has a suggestion for you.

Click through for ways to protect yourself.  The best way to protect yourself is not to have SQL Server 2000 around anymore.

Row Counts For All Tables

Andrew Peterson has a script to get row counts for each table based on sys.dm_db_partition_stats:

When you need the row count from a table, the most common way is to use count(*). But what if you need the row count for all the tables in your database?  You could use the sp_MSforeachtable stored procedure:

EXEC sys.sp_MSforeachtable ‘select ”?”,count(*) from ?;’;

Nice, but the return format is a bit difficult to manage, since it effectively runs the query for as many times as you have tables in your database.

Click through for Andrew’s script.  One thing to keep in mind is that the number of rows might be off, especially for columnstore tables with ghost records.  It’s an estimate, but one which tends toward the correct answer.

Looking At Columnstore In vNext

Dmitry Pilugin looks at non-clustered columnstore indexes in vNext:

Note: Notice, that I created a varchar(max) column in a table, that was done deliberately, because I wanted to test if LOB columns are supported in vNext. Unfortunatelly, when I tried to create a nonclustered Columnstore index, I got an error:
The statement failed. Column ‘Note’ has a data type that cannot participate in a columnstore index. Omit column ‘Note’.

However, if you create a clustered Columnstore, you will succeed! Though, you will get the following warning in CTP 1.2:

Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.

I think it is great that we can now have a LOB varchar columns in Columnstore, because it was a blocker for some scenarios, at least I have seen some of those.

There are a few interesting findings here, so check it out.

Functions To Know By Heart

Kevin Hill has a top ten functions list for database administrators:

3. DBCC Opentran – log file filling up?  Processes blocked and timing out? See if some doofus (hopefully not you) left a transaction open and went to lunch.  If you are lucky you can kill it.   Unlucky and you find an undistributed replicated transaction.

4. Getdate() – Not sure why, but this winds up in probably 80% of the scripts I write.  Not surprisingly, it often brings along it’s little brother, DateDiff.

I’d go with GETUTCDATE() or SYSUTCDATETIME() over GETDATE(); that way, everything gets stored as UTC and you spend less time worrying about time zone or Daylight Savings Time issues.

Finding Candidate Keys

Daniel Hutmacher explains ways to find candidate keys:

Let’s assume we have a temp table heap called #table, with 9 columns and no indexes at all. Some columns are integers, one is a datetime and few are numeric. As I’m writing this post, my test setup has about 14.4 million rows.

In the real world, when you’re investigating a table for primary key candidates, there are a few things you’ll be looking for that are beyond the scope of this post. For instance, it’s a fair assumption that a numeric or float column is not going to be part of a primary key, varchar columns are less probable candidates than integer columns, and so on. Other factors you would take into consideration are naming conventions; column names ending with “ID” and/or columns that you can tell are foreign keys would also probably be good candidates.

It’s useful to think of all the candidate keys, as getting to Boyce-Codd Normal Form or 4th/5th NF involves dealing with all potential primary keys, not just the one you selected.  Daniel’s post gives you several different methods of searching existing data; combine that with domain knowledge and a bit of logic and you have a pretty decent start at finding candidate keys.

Memory-Optimized Hash Indexes And Bucket Counts

Jack Li relates a story about how it’s important to think through bucket count for memory-optimized hash indexes:

I was working with a customer to troubleshoot memory optimized table issues.  In this scenario, our customer uses a memory optimized table variable.  He put 1 million rows of data into the table variable and then process it.  Based on what he said, I tried to come up with a repro to see if I can duplicate the problem.   While troubleshooting that issue, I ran into another issue where I can’t even insert 1 million row into a memory optimized table variable.

My rule of thumb is 30-50% more buckets than expected rows.  That way I can handle some level of unexpected growth while keeping the chance of a hash collision and really slow linked list scan as low as possible.  The official guidance says performance should be “acceptable” with up to 5x rows per buckets, but my experience has been that “acceptable” is a generous term at that point.

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728