Press "Enter" to skip to content

Curated SQL Posts

Dynamic DNS And Powershell

Drew Furgiuele explains dynamic DNS and shows how to use Powershell to keep your Google Dynamic DNS record up to date:

Google’s Dynamic DNS API is really little less than a HTTPS POST to a specific URL with some parameters packed into the post. What makes it work, though, is for each dynamic DNS host name you provide, you get an auto-generated user name and password that needs to be part of the POST. PowerShell is really naturally suited for this type of automation, thanks to Invoke-WebRequest (MSDN Link). It’s built for this, and even supports HTTPS. Heck, it even supports the implementation of username:password in the request via a PSCredential object.

All I really needed to do was to wrap the code in a function and add some parameters that can be passed in, invoke the web request, and parse the results.

Click through for more information, including a couple Powershell scripts.

Comments closed

Market Basket Analysis Basics

Leila Etaati has an introduction to market basket analysis with R:

For instance, imagine we have below transaction items from a shopping store  for last hours,

Customer 1: Salt, pepper, Blue cheese

Customer 2: Blue Cheese, Pasta, Pepper, tomato sauce

Customer 3: Salt, Blue Cheese, Pepperoni, Bacon, egg

Customer 4: water, Pepper, Egg, Salt

we want to know how many times customer purchase pepper and salt together
the support will be : from out four main transactions (4 customers), 2 of them purchased salt and pepper together. so the support will be 2 divided by 4 (all number of transaction.

Basket analysis is one way of building a recommendation engine:  if you’re buying butter, cream, and eggs, do you also want to buy sugar?

Comments closed

SQL Slammer Is Still A Thing

Thomas LaRock notes that SQL Slammer is still out there:

But all of that is in the past. Here’s what you need to know about SQL Slammer today.

First, this worm infects unpatched SQL 2000 and MSDE instances only. About a month ago, I would have thought that the number of such installs would be quite small. But the recent uptick in Slammer tells me that there are enough of these systems to make Slammer one of the top malware detected at the end of 2016. And a quick search at Shodan shows thousands of public-facing database servers available. And if you want to have some real fun at Shodan®, Ian Trump (blog@phat_hobbit) has a suggestion for you.

Click through for ways to protect yourself.  The best way to protect yourself is not to have SQL Server 2000 around anymore.

Comments closed

Row Counts For All Tables

Andrew Peterson has a script to get row counts for each table based on sys.dm_db_partition_stats:

When you need the row count from a table, the most common way is to use count(*). But what if you need the row count for all the tables in your database?  You could use the sp_MSforeachtable stored procedure:

EXEC sys.sp_MSforeachtable ‘select ”?”,count(*) from ?;’;

Nice, but the return format is a bit difficult to manage, since it effectively runs the query for as many times as you have tables in your database.

Click through for Andrew’s script.  One thing to keep in mind is that the number of rows might be off, especially for columnstore tables with ghost records.  It’s an estimate, but one which tends toward the correct answer.

Comments closed

Looking At Columnstore In vNext

Dmitry Pilugin looks at non-clustered columnstore indexes in vNext:

Note: Notice, that I created a varchar(max) column in a table, that was done deliberately, because I wanted to test if LOB columns are supported in vNext. Unfortunatelly, when I tried to create a nonclustered Columnstore index, I got an error:
The statement failed. Column ‘Note’ has a data type that cannot participate in a columnstore index. Omit column ‘Note’.

However, if you create a clustered Columnstore, you will succeed! Though, you will get the following warning in CTP 1.2:

Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.

I think it is great that we can now have a LOB varchar columns in Columnstore, because it was a blocker for some scenarios, at least I have seen some of those.

There are a few interesting findings here, so check it out.

Comments closed

Functions To Know By Heart

Kevin Hill has a top ten functions list for database administrators:

3. DBCC Opentran – log file filling up?  Processes blocked and timing out? See if some doofus (hopefully not you) left a transaction open and went to lunch.  If you are lucky you can kill it.   Unlucky and you find an undistributed replicated transaction.

4. Getdate() – Not sure why, but this winds up in probably 80% of the scripts I write.  Not surprisingly, it often brings along it’s little brother, DateDiff.

I’d go with GETUTCDATE() or SYSUTCDATETIME() over GETDATE(); that way, everything gets stored as UTC and you spend less time worrying about time zone or Daylight Savings Time issues.

1 Comment

Finding Candidate Keys

Daniel Hutmacher explains ways to find candidate keys:

Let’s assume we have a temp table heap called #table, with 9 columns and no indexes at all. Some columns are integers, one is a datetime and few are numeric. As I’m writing this post, my test setup has about 14.4 million rows.

In the real world, when you’re investigating a table for primary key candidates, there are a few things you’ll be looking for that are beyond the scope of this post. For instance, it’s a fair assumption that a numeric or float column is not going to be part of a primary key, varchar columns are less probable candidates than integer columns, and so on. Other factors you would take into consideration are naming conventions; column names ending with “ID” and/or columns that you can tell are foreign keys would also probably be good candidates.

It’s useful to think of all the candidate keys, as getting to Boyce-Codd Normal Form or 4th/5th NF involves dealing with all potential primary keys, not just the one you selected.  Daniel’s post gives you several different methods of searching existing data; combine that with domain knowledge and a bit of logic and you have a pretty decent start at finding candidate keys.

Comments closed

Memory-Optimized Hash Indexes And Bucket Counts

Jack Li relates a story about how it’s important to think through bucket count for memory-optimized hash indexes:

I was working with a customer to troubleshoot memory optimized table issues.  In this scenario, our customer uses a memory optimized table variable.  He put 1 million rows of data into the table variable and then process it.  Based on what he said, I tried to come up with a repro to see if I can duplicate the problem.   While troubleshooting that issue, I ran into another issue where I can’t even insert 1 million row into a memory optimized table variable.

My rule of thumb is 30-50% more buckets than expected rows.  That way I can handle some level of unexpected growth while keeping the chance of a hash collision and really slow linked list scan as low as possible.  The official guidance says performance should be “acceptable” with up to 5x rows per buckets, but my experience has been that “acceptable” is a generous term at that point.

Comments closed

SET NOEXEC ON

Andy Kelly shows how to prevent part of a script from running:

 

If we were to hit F5 (or however you execute your TSQL statements in SSMS) without highlighting any statement(s) they would all be executed, one batch after the other. Even if one batch were to fail or we had a THROW in that batch it would fail at that point but execution would continue immediately after the next GO until the end. This is where SET NOEXEC ON comes into play. If I add that at the beginning of the script all succeeding code would not be executed. The statements would only be compiled and not actually run. It would look like this:

SET NOEXEC ON;

PRINT ‘Got Here 1’ ;
GO
PRINT ‘Got Here 2’ ;
GO
PRINT ‘Got Here 3’ ;
GO
PRINT ‘Got Here 4’ ;
GO

This is a useful “accidental F5” protection:  you can put it at the top of your long script to keep from running the whole thing at once.

Comments closed

Parsing Gigantic JSON Text

Jovan Popvic has created a 4.35 GB JSON array:

SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text – up to 4GB.

First, I would need very large JSON document. I’m using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:

My first draft read “Jovan Popovic has created a monster.”  I might go back to that one.  On the plus side, the operation took a lot less time than I had expected, though I’d have to imagine that his SQL Express instance had some decent specs.

Comments closed