I added a column: RollinAvgSteps = AVERAGEX(FILTER(fitbit_export_20160214, EARLIER(fitbit_export_20160214[Date])>=fitbit_export_20160214[Date]),fitbit_export_20160214[Steps])
…which takes the average of my steps to date. There are a bunch of ways to achieve this, but this is the way that I chose. And you can see that the average line is (happily) improving! Oh, and because I pulled down the extract on the 14th, there’s a dip at the end. My numbers were much healthier by the end of the day, and despite spending way too long NOT walking, I did about 7244 steps that day.
You can see the result at http://bit.ly/RobFitbit
I like the rolling average that Rob added in.
Thanks to everybody that participated in this month’s T-SQL Tuesday. A big thanks to everyone who wrote a participating blog post, and a really huge thanks to everyone who read the posts shared by this month’s participants. If you follow one of the links on this round-up page, I will kindly ask that you leave a comment on a blog post that you read if it teaches you something, gives you a new perspective, or makes you think. A blog post is just a bunch of words until somebody reads it, and blog writers love it when they hear that their post resonated with someone on some level.
Thanks to Robert for hosting T-SQL Tuesday #74.
Make sure that this is the ONLY code in your window or that you are protected by a RETURN or SET EXECUTION OFF at the top of your screen. I have this put in place by default on new query windows. This protects you from running too much code by accident.
Make a habit of checking what instance you are connected to before running any ad-hoc code. Running code meant for a model or test environment in production can be a very scary thing.
This is good advice.
I use a pattern that includes four fields on all transactional tables. This (absolutely) includes lookup tables too. The two table types that are an exception to this pattern are audit tables and error tables. I’ll cover why later in this article.
Four fields include CreatedOn, CreatedBy, UpdatedOn, and UpdatedBy. The dates should be DateTime2. CreatedOn is the easiest to populate. You can create a default on the field to be populated with GetDate().
This is a common pattern and works pretty well. The trick is making sure that you keep that metadata up to date.
The below XML has data nested in different levels that requires the nodes method to join them together. The nodes method accepts a XML string and returns a rowset. That rowset can then be used with CROSS APPLY to effectively link your way down.
nodes (XQuery) as Table(Column)
The tabular format I need requires data from 3 different levels of this XML gob and I need to wade through 5 “tables” to get there.
Shredding XML is something you occasionally need to do.
1. It can let you access data in the columns of those tables, to use in predicates or expressions.
2. It can let you filter the data in the base table, by only allowing rows which match, such as when using an inner join or right outer join.
3. It can cause rows in the base table to be returned multiple times, if multiple rows in the joined table match a single row in the base table.
4. It can introduce NULL rows, if a full or right outer join is being done (or a left outer join with the base table second) and there are rows in the joined table that don’t match any rows in the base table.
This is a useful bit of T-SQL-specific syntax, but it’s a sharper edge than most UPDATE statements. For a look back in history, Hugo Kornelis wanted to deprecate this syntax with the release of SQL Server 2008 (though MERGE has its own bugs and “Won’t Fix” problems, so in retrospect, perhaps it’s best that we still have UPDATE FROM).
Mickey Stuewe hosted T-SQL Tuesday this month. Her topic: data modeling gone wrong. A few choice posts on the topic follow.
One of the problems I’ve seen with careless use of surrogate keys are the duplication of natural keys. Quite often it’s overlooked that the natural key still needs to have a unique constraint. Without it, the reporting team ends up having to use MAX or DISTINCT to get the latest instance of the natural key, or SSIS packages are needed to clean up the duplicates. This can be compounded with many-to-many tables.
Surrogate keys are not replacements for natural keys; they are physical implementation mechanisms to make your life easier.
Rob Farley wants you to think about design and whether your warehouse is built in a way that helps the business:
Many data professionals look at a data warehouse as a platform for reporting, built according to the available data sources. I disagree with this.
The models within a data warehouse should describe the business. If it doesn’t, it’s a data model gone wrong.
What is the central thing that your business does? What is the main interest point? What do you need to look after? For me, this forms the core of the warehouse.
Thomas Rushton says name your stuff right. Picking the right name can be difficult. “Field1” probably isn’t the right name, though.