As I recently got asked for it in a talk, this piece of code gives you all the Views in a database that are currently broken.
This could be useful for “what if”-scenarios when playing with your metadata.
Click through for the code. This is another in Ben’s enjoyable ongoing series of non-ETL things you can do with Biml.
To sum up, association rules is a great and powerful algorithm for finding the correlations between items and the fact that you can use this straight from SSMS, it just gives me goosebumps. Currently just the performance is a bit of a drawback. Also comparing this algorithm to Analysis services (SSAS) association rules, there are many advantages on R side, because of maneuverability and extracting the data to T-SQL, but keep in mind, SSAS is still very awesome and powerful tool for statistical analysis and data predictions.
Figuring out variations after the fact is an all-too-common task, and this is a good way of getting some ideas on how to do that.
Using T-SQL to insert, update, or delete large amounts of data from a table will results in some unexpected difficulties if you’ve never taken it to task.
Let’s say you have a table in which you want to delete millions of records. If the goal was to remove all then we could simply use TRUNCATE. However, if we want to remove records which meet some certain criteria then executing something like this will cause more trouble that it is worth.
I do like the delete process. The update process is going to run tableRows/batchSize full scans, so I’m not as fond of that one. Do read the whole thing.
You are going to cause all sorts of problems if you write code like this. First off, if you really do have these three queries within the same stored procedure, how hard will it be to confuse which is table ‘a’ in each of the queries when you go back to edit them? Pretty easy.
It gets worse though. I know that none of us will ever write a query that exceeds 26 tables in a JOIN… well, except that one time… and that other time. In fact, it happens. Oh, it’s not always a good thing, but it’s a thing. How do we respond to that? I’ve seen this:
Some of this stuff is egregious. Some of it is debatable. But either way, well-formatted code helps prevent bugs and aid understanding of queries.
The primary system I deal with on a daily basis was originally developed as a DOS application and several of the above examples are drawn from it. Looking at the core tables and columns, it’s easy to identify those that began life in those early days – they all have 8-character names. Time moved on and the system grew and evolved. DOS to Windows. Windows to the web. But the database, and the practices and patterns used in the database, haven’t come along for the ride.
Data schema conversions can be hard and disruptive – you need to update your application, your stored procedures, and provide customers/users with a clean migration path. Code changes require testing. Complexity and cost grows every time you introduce changes. I get that.
There’s a lot of effort in Andy’s advice, but it’s well worth it.
Couple of packages I will mention for data manipulations are plyr, dplyr and data.table and compare the execution time, simplicity and ease of writing with general T-SQL code and RevoScaleR package. For this blog post I will use R packagedplyr and T-SQL with possibilites of RevoScaleR computation functions.
My initial query will be. Available in WideWorldImportersDW database. No other alterations have been done to underlying tables (fact.sale or dimension.city).
Read on for code and conclusions. I don’t think there are any shocking conclusions: the upshot is to filter data as early as possible.
Even though we would expect to see both records returned we only get 1 record. Huh? This is exactly what puzzled a coworker, ofcourse query was not as simple as this one but same issue caused him to hit a road block.
In the case of COALESCE and OR methods, results are identical.
The underlying issue here is that the variable data type differs from the column’s data type, and exposes a difference in how COALESCE and ISNULL work.
This is the most complicated of the three. It does a standard rounding. If value is .5 or over then you get back 1. If it’s less than .5 you get back 0. On top of that you get to pass the place you want to round to. So for example 0 rounds to the nearest ones place, -1 rounds to the tens place, 2 rounds to the hundredths.
There’s a bonus here: if you use ROUND in T-SQL, the results are different than the default Round method in .NET; that method uses banker’s rounding by default whereas the T-SQL rounding does not.
Or if you’re ready to take it to the ╰[ ⁰﹏⁰ ]╯level then change the column to NChar and use the hex of your favorite emoji like:
DECLARE @testtable1 TABLE ( testid int identity (1,1), testvalue nchar (255))
–use the below for the final query
SELECT ISNULL((SELECT testvalue from @testtable1 where testvalue > 101), NCHAR(0xD83D)+ NCHAR(0xDE20) ) as testvaluethatworks
There are a few other alternatives, such as loading results into a temp table and inserting a default row if the temp table is empty.
I was playing with some Extended Events recently. If you haven’t tried, I’d encourage you to do so. However, working with XML is not my favorite. I know I can get the GUI in SSMS 16.x to show me events, but I sometimes want to query.
Here was my quick adventure in XML and XQUERY. I should know this stuff better, but I think I’m working with XML so rarely that I’m constantly re-learning things.
Read on for the code.